Project

General

Profile

[logo] 
 
Home
News
Activity
About/Contact
Major Tools
  Dinotrace
  Verilator
  Verilog-mode
  Verilog-Perl
Other Tools
  BugVise
  CovVise
  Force-Gate-Sim
  Gspice
  IPC::Locker
  Rsvn
  SVN::S4
  Voneline
  WFH
General Info
  Papers

Issue #225

hierarchical compilation of designs for scalability

Added by joe barjo over 9 years ago. Updated over 9 years ago.

Status:
Feature
Priority:
Normal
Assignee:
-
Category:
Unsupported
% Done:

0%


Description

Hi

Our tests show that verilator is unusable for rather big designs. The reason for this (from what I understand) is that it doesn't support hierarchical compilation. I don't know much about verilator internals, I think I have a suggestion to handle hierarchy with cycle based simulators. SystemCass is a systemC "compatible" simulator which has a much better performance because the models are composed of 3 functions: * Transition function; * Moore generation function; * Mealy generation function. These 3 functions should be "easily" generated by verilator. A hierarchical compilation not only brings scalability, but it should also ease parallelism for the code generation. This feature would be an overkill.

More info at https://www-asim.lip6.fr/trac/systemcass/

Thanks

History

#1 Updated by Wilson Snyder over 9 years ago

  • Category set to Unsupported
  • Status changed from New to Feature

From reading the webpage (annoying it's https without a certificate, but anyhow) I seriously doubt that SystemCass could get even remotely close to the performance of Verilator, which does much more than what they optimize - the code Verilator generates in the internal model is not SystemC.

Anyhow, how large is your design? Very large (>>10M gates) designs can have a SystemC shell to instantiate separately verilated modules. See [[http://www.veripool.org/boards/2/topics/show/84-Verilator-Verilator-best-practice-flow-Large-designs-bottom-up]]

I would like to have Verilator generate such shells, but as with all things, time limits. If you want to look at implementing hierarchical compiles I'll give you pointers.

#2 Updated by joe barjo over 9 years ago

Thanks for your response

For systemCass, the internal model can also be something different from system C.

The test design we used was about 1.5M gates. Our flow actually generates vhdl, and we are actually switching to verilog.

For the test (I know this is a bad test...), we synthetised the vhdl (with synDc), and tried to simulated the generated structural verilog. Verilator simply blew out the memory (more than 3Gb) and never went through. Then we post processed the generated verilog and transformed gate instances into equations. It took verilator about 1h to generate 1000 cpp files. Then it took about 1 hour per cpp file to compile. (We haven't compiled this)

With icarus verilog, it took about 18h to generate a binary.

Today we are generating native verilog code that is much more compact. We will keep you posted with the test results. We are currently evaluating verilator, icarus verilog and gpl cver.

I don't fully understand how to use the system perl shells, and how they integrate with verilator.

#3 Updated by Wilson Snyder over 9 years ago

Oh, you're simulating gates, not RTL?

For that large a gatesim, you'll want a commercial simulator. That's not what any of the open source tools are designed for (to my knowledge), the "market" is just too small.

Converting it to RTLish is interesting though... I'm surprised it goes that slowly, though the memory isn't a surprise - that's fine. Can you run with --stats and send the resulting file? I'm guessing your outputter needs to remove more hierarchy.

#4 Updated by joe barjo over 9 years ago

Gate simulation is not our goal. This was just a test gplcver eats this gate level verilog instanly... But it is slow...

We now do output rtl level verilog. But it currently has interfaces with vhpi or vpi. So we will use a much smaller design without vpi for a new verilator test. I'll keep you updated with results

Thanks

#5 Updated by Wilson Snyder over 9 years ago

BTW, I suspect you already thought of this, but make sure your system isn't paging. I ran our whole chip, it's 7.3 GB but only 18 minutes.

Also available in: Atom