

#### Verilator & Internals July 2005 Philips Semiconductor

Wilson Snyder, SiCortex, Inc. wsnyder@wsnyder.org http://www.veripool.com

#### Agenda



- Preliminary Diversion Verilog-Mode
- Introduction & History
- Using Verilator
  - Example Translations
  - SystemPerl
  - Verilog Coding Tips
- Verilator Internals and Debugging
  - Data structures
  - Debugging tips
- Futures
  - Future Features
  - You can help
- Conclusion & Getting the Tools

#### **Verilog-Mode for Emacs**



- Ralf asked for a summary of my other tools, and this is my most popular,
- So, before we talk about a compiler, let's talk about making Verilog coding more productive.
- (This is an excerpt of a larger presentation on www.veripool.com.)

#### **Verilog-Mode for Emacs**



- Verilog requires lots of typing to hook up signals, ports, sensitivity lists, etc.
- Verilog-Mode saves you from typing them
  - Fewer lines of code means less development, fewer bugs.
- Not a preprocessor, but all input & output code is always completely "valid" Verilog.
  - Can always edit code without the program.
- Batch executable for non-Emacs users.

#### **Verilog-mode Meta-Comments**





Verilator Internals, July 2005. Copyright 2005 by Wilson Snyder; public redistribution allowed as complete presentation.

#### **Automatic Wires**





Verilator Internals, July 2005. Copyright 2005 by Wilson Snyder; public redistribution allowed as complete presentation.

#### **Automatic Registers**





```
output [1:0] from_a_reg;
output
             not a reg;
 /*AUTOWIRE*/
 /*AUTOREG*/
 // Beginning of autos
 reg [1:0] from a reg;
 // End of automatics
wire not_a_reg = 1'b1;
 always
   ... from a reg = 2'b00;
              (Verilog-Mode)
     GNU Emacs
```

#### **Simple Instantiations**





Verilator Internals, July 2005. Copyright 2005 by Wilson Snyder; public redistribution allowed as complete presentation.

#### **Exceptions to Instantiations**





#### **Multiple Instantiations**





Verilator Internals, July 2005. Copyright 2005 by Wilson Snyder; public redistribution allowed as complete presentation.

#### More Autos & Verilog-Mode Conclusion



#### • Other Automatics

- AUTOASCIIENUM Create ascii decodings for state machine states.
- AUTOINOUTMODULE Pull all I/O from another module for shells
- AUTOOUTPUT Make outputs for all signals
- AUTORESET Reset all signals in a flopped block
- Goto-module Goto any module name in your design with 2 keys.
- Verilog-mode allows for faster coding with less bugs.
- There's NO disadvantages to using it.
  - Many IP Vendors, including MIPS and ARM use Verilog-Mode.
- For More Information
  - See presentations, papers and the distribution on http://www.veripool.com



# Verilator Introduction & History

#### Introduction



- Verilator was born to connect Verilog to C/C++.
  - Verilog is the Synthesis Language.
  - C++ is generally the embedded programming language.
  - And C++ is often the Test-bench Language.
  - Of the 15 chip projects I've worked on, all but two had C++ test benches with Verilog cores.
- So, throw away the Verilog testing constructs, and let's synthesize the Verilog into C++ code.
- Your simulator model can now be 100% C++!
  - Enables easy cross compiling, gdb, valgrind, lots of other tools.

# History (1 of 2)



- 1994 Digital Semiconductor
  - We were deciding on the methodology for a new Core Logic chipset.
     We wanted Verilog for Synopsys synthesis. We already had a C-based simulation system for our CPU, so Paul Wasson decided he would write a program to directly translate Synthesizable Verilog into C.
    - Verilator 1.0 maps Verilog almost statement-to-statement to C.
       We let the C compiler do all optimizations.
- 1997/8/9 Intel Network Processors
  - Duane Galbi starts using Verilator at Intel, and takes over the source.
    - Verilator simplifies Boolean math expressions (2x faster).
    - Verilator released into the public domain.
- 2001 Maker (Conexant/Mindspeed)
  - After a few years, I take the sources from Duane.

# History (2 of 2)



- 2001/2 Nauticus Networks
  - At my new startup, we decide to try a new modeling language, SystemC.
     Verilator gets adopted to enable easy substitution of Verilog into SystemC.
    - Verilator 2.1.0 outputs first SystemC code.
- 2003/4 Nauticus/Sun Microsystems
  - Verilator gets a complete rewrite and moves from translation to a full compiler.
    - Verilator 3.000 adds std compiler optimizations. (3x faster)
    - Verilator 3.012 has first patch from a outside contributor.
    - Verilator 3.201 orders all code. (2x faster & beats VCS.)
- 2004/5 SiCortex
  - I join my third startup, and we settle on a SystemC methodology. SiCortex starts purchasing IP, so Verilator grows to support most Verilog 2001 features.
    - Verilator tops 100 downloads per month (excluding robots, etc)



# **Using Verilator**



#### **Verilator is a Compiler**



- Verilator compiles Synthesizable Verilog into C++
  - Always statements, wires, etc.
  - No time delays ( a <=  $\#\{n\}$  b;)
  - Only two state simulation (no tri-state busses).
  - Unknowns are randomized (even better than having Xs).
  - All clocks from primary inputs (well, some generated might be ok...)
- Creates a "pure" C++/SystemC wrapper around the design
- Creates own internal interconnect and signal formats
  - Version 2.0.0 tried sc\_signals, but they are >>10x slower!
  - Plays several tricks to get good, fast code out of gcc.

#### **Example Translated to C++**



- The top wrapper looks similar to the top Verilog module.
- Inputs and outputs map directly to bool, uint32\_t, uint64\_t, or array of uint32\_t's:

| <pre>module Convert;</pre>    |
|-------------------------------|
| input clk                     |
| <pre>input [31:0] data;</pre> |
| output [31:0] out;            |
|                               |
| always @ (posedge clk)        |
| out <= data;                  |
| endmodule                     |

| <pre>#include "verilated.h"</pre> |
|-----------------------------------|
| class Convert {                   |
| bool clk;                         |
| uint32_t data;                    |
| uint32_t out;                     |
| <pre>void eval();</pre>           |
| }                                 |

## **Calling the model**

• You generally call the Verilated class inside a simple loop.

```
int main() {
    Convert* top = new Convert();
    while (!Verilated::gotFinish()) {
        top->data = ...;
        top->clk = !top->clk;
        top->eval();
        ... = top->out();
        // Advance time...
    }
}
```







#### **SystemPerl**



- Verilator can output a dialect of SystemC, SystemPerl.
  - This matches SystemPerl code hand-written by our architects and verifiers.
  - SystemPerl translates the files into standard SystemC code we compile.
  - Similar to Verilog-Mode for Emacs.
- SystemPerl makes SystemC faster to write and execute
  - My last project wrote only 43% as many SystemC lines due to SysPerl.
  - SystemPerl standardizes Pin and Cell interconnect,
  - Lints interconnect (otherwise would get ugly run-time error),
  - Automatically connects sub-modules in "shell" modules, ala AUTOINST,
  - And adds "#use" statements for finding and linking library files.
- Reducing code means faster development
  - And less debugging!

# **Example Translated to SystemPerl**



 Inputs and outputs map directly to bool, uint32\_t, uint64\_t, or sc\_bv's.

•••

• Similar for pure SystemC output.

```
module Convert;
input clk
input [31:0] data;
output [31:0] out;
always @ (posedge clk)
out <= data;
endmodule
```

```
#include "systemperl.h"
#include "verilated.h"
SC MODULE(Convert) {
   sc in clk clk;
   sc in<uint32 t> data;
   sc_out<uint32 t> out;
   void eval();
}
SP CTOR IMP(Convert) {
 SP CELL(v,VConvert);
  SC METHOD(eval);
 sensitive(clk);
}
```

#### Talking C++ inside Verilog (1 of 2: Public Functions)



- Verilator allows tasks/functions to be called from "above" C++
  - We typically use this to have "0 time" configuration of the chip registers.
  - Some performance impact, but far less then what a commercial tool does.



#### Talking C++ inside Verilog (2 of 2: Embedding)



Verilator allows C++ code to be embedded directly in Verilog



## **Verilog Coding Tips**



- Lint your code
  - Verilator doesn't check for many common coding mistakes.
    - For example, assigning a input found a bug which dumped core.
- Don't ignore warnings
  - Verilator gets limited testing of cases where they are suppressed.
  - Generally, it doesn't complain about reasonable constructs
    - wire [2:0] width\_mismatch = 0; // This is fine!
  - Ignoring "unoptimizable" warnings can drop performance by 2x.
- Split up always statements for performance
  - Put as few statements as reasonable in a combo or seq always block.
  - This allows it to better order the code.
- Gate clocks in common module and ifdef
  - Any ugly constructs should be put into a cell library, and ifdef'ed into a faster Verilator construct.

## **Verilator and Commercial Tools**



- Verilator wasn't intended as a substitute for a commercial simulator, but the best way to get the job done. Until recently, all the commercial tools required slow PLI code, and any mixing of C++ and Verilog was very painful.
- I've never used Verilator as a sign-off simulator.
  - Last project, we captured the pins from a Verilator run, and replayed against a gate level model running on a commercial simulator.
- Though we certainly buy fewer simulator licenses.
  - 80% of our simulation cycles are under Verilator.
  - It's not so much to save money the Verilated model builds 1.7 times faster, and runs 2.1 times faster.
- Fortunately the vendors ignore me.
  - Until the first big EE Times article? ©



# Verilator Internals and Debugging

Verilator Internals, July 2005. Copyright 2005 by Wilson Snyder; public redistribution allowed as complete presentation.

# **Internal Loop**

 Inside the model are two major loops, the settle loop called at initialization time, and the main change loop.





28







• The core data structure is a AstNode, forming a tree



• If you run with -debug, you'll see this in a .tree file:



#### **Visitor Template**



- Most of the code is written as transforms on the tree
- Uses the C++ "Visitor Template" The node-type overloaded visit function is called each time a class TransformVisitor : AstNVisitor { node of type AstAdd is seen in the tree. virtual void visit(AstAdd\* nodep) { // hit an add Recurse on all nodes below nodep->iterateChildren(\*this); this node; calling the visit function based on the node virtual void visit(AstNodeMath\* nodep) type. // A math node (but not calle) on an // AstAdd because there is a // visitor for AstAdd above. nodep->iterateChildren(\*this); virtual void visit(AstNode\* nodep) { AstNodeMath is the base class // Default catch-all of all math nodes. This will get nodep->iterateChildren(\*this); called if a math node visitor isn't otherwise defined.

#### Compile Phases (1 of 4: Module based)



#### • Link.

- Resolve module and signal references.

#### Parameterize

- Propagate parameters, and make a unique module for each set of parameters.
- Expression Widths
  - Determine expression widths. (Surprisingly complicated!)
- Coverage Insertion (Optional)
  - Insert line coverage statements at all if, else, and case conditions.
- Constification & Dead code elimination
  - Eliminate constants and simplify expressions
  - ("if (a) b=1; else b=c;" becomes "b = a|c";)

#### Compile Phases (2 of 4: Module based)



- Task Inlining, Loop Unrolling, Module Inlining (Optional)
  - Standard compiler techniques.
- Tristate Elimination
  - Convert case statements into if statements
  - Replace assignments to X with random values. (Optional)
- Always Splitting (Optional)
  - Break always blocks into pieces to aid ordering.
    - always @ ... begin a<=z; b<=a; end</li>

#### Compile Phases (3 of 4: Scope Based)



#### Scope

- If a module is instantiated 5 times, make 5 copies of its variables.
- Gate (Optional)
  - Treat the design as a netlist, and optimize away buffers, inverters, etc.
- Delayed
  - Create delayed assignments.
    - a<=a+1 becomes "a\_dly=a; ....; a\_dly=a+1; .... a=a\_dly;"
- Ordering
  - Determine the execution order of each always/wire statement.
    - This may result in breaking combinatorial feedback paths.
    - The infamous "not optimizable" warning
- Lifetime Analysis (Optional)
  - If a variable is never used after being set, no need to set it.
  - If code ordering worked right, this eliminates \_dly variables.

#### **Compile Phases** (4 of 4: Backend)



- Descope
  - Reverse the flattening process; work on C++ classes now.
- Clean
  - Add AND statements to correct expression widths.
  - If a 3 bit value, then a + 1 in C++ needs to become (a+1) & 32'b111;
- Expand & Substitute (Optional)
  - Expand internal operations (bit selects, wide ops) into C++ operations.
  - Substitute temporaries with the value of the temporary.
- Output code

#### **Graph Optimizations**



- Some transformations rely heavily on graphs.
- Always Block Splitting
  - Vertexes are each statement in the always block
  - Edges are the variables in common between the statements
  - Any weakly connected sub-graphs indicate independent statements.

#### Gate Optimization

- Just like a netlist, Vertexes are wires and logic
- Edges connect each wire to the logic it receives/drives
- We can then eliminate eliminate buffers, etc.
- Ordering
  - Vertexes are statements and variables
  - Directional edges connect the statements to relevant variables
  - Break edges if there are any loops
  - Enumerate the vertexes so there are no backward edges

# **Debugging Verilated code**



#### • Sorry.

- Run with –debug
  - This enables internal checks, which may fire a internal assertion.
  - It also dumps the internal trees.

#### Try -O[each lower case letter]

- This will disable different optimization steps, and hopefully isolate it.

- Make a standalone test\_regress example
  - This will allow me to add it to the suite. See the verilator manpage.
- (Experts) Look at the generated source
  - Examine the output .cpp code for the bad section.
  - Then, walk backwards through the .tree files, searching for the node pointer, and find where it got mis-transformed.
  - That generally isolates it down to a few dozen lines of code.



# **Futures**



Generally, new features are added as requested, unless they are difficult ©

- Support Generated Clocks (correctness, not speed)
  - Most of the code is there, but stalled release as too hard to debug.

#### $\star$ Assertions (PSL or SVL)

- Especially need testcases, and an assertion generator.
- If anyone has experience translating or optimizing assertions, let me know.

#### Signed Numbers

- Often requested, but lots of work and not valuable to my employer.
- Now even "integers" are unsigned. Surprisingly, this rarely matters.

# Future Performance Enhancements (1 of 2)



- Eliminate duplicate logic (commonly added for timing.)
   "wire a=b|c; wire a2=b|c;"
- Gated clock optimizations
  - Philips is transforming some input code into mux-flops.
- Latch optimizations
  - Not plentiful, but painful when they occur.
- Cache optimizations
  - Verilator optimizes some modules so well, it becomes load/store limited.
  - Need ideas for eliminating load/stores and cache packing.
- Reset optimization
  - Lots of "if reset..." code that doesn't matter after a few dozen cycles.

# Future Performance Enhancements (2 of 2: Dreams)



- Embed SystemPerl and schedule SystemC code.
  - Profiling shows some large tests spend more time in SystemC interconnect routines then in the Verilated code.
- Multithreaded execution
  - Multithreaded/multicore CPUs are now commodities.

#### **Tool Development**



- FAQ: How much time do you spend on this?
  - Verilator currently averages about a day a week.
  - All my other tools add up to at most another day or so though most of these changes are to directly support our verification and design team.
  - Rest of time I'm doing architecture and RTL for my subchip.
- I have help
  - Major testing by Ralf Karge here at Philips, and Jeff Dutton at Sun.
- FAQ: How can I get "away" with spending time on this?
  - When choosing a job, I've obtained a contractual agreement my tools may become public domain.
    - Sort of a "reverse NDA" Imagine if this were the general practice.
  - In trade, I bring a productivity increase to the organization.
  - Public feedback makes the tools better.
    - A user contributed 64 bit patches, two months later I'm on a 64 bit OS.

#### You Can Help (1 of 2: Testing, testing, testing...)



- Firstly, I need more testcases!
  - Many enhancements are gated by testing and debugging.
- Large standalone test cases
  - Need a large testchips and **simple** testbenches.
  - Add a tracing and cycle-by-cycle shadow simulation mode, so finding introduced bugs is greatly simplified?
  - Port and run all opencores.org code?

#### $\star$ Need better random Verilog program generator

- Now, most arithmetic bugs are found with vgen, a random math expression generator.
- It commonly finds bugs in other simulators and GCC.
- This needs to be extended/rewritten to test the rest of the language.
- A great use for a idle verification engineer know one?

#### • Improve the graph viewer or find another (Java)

#### You Can Help (2 of 2)



- Run gprof/oprofile and tell me your bottlenecks
  - Most optimizations came from "oh, this looks bad"
- Tell me what changes you'd like to see
  - I don't hear from most users, and have no idea what they find frustrating.
- Advocate.
- Of course, patches and co-authors always wanted!
   The features listed before, or anything else you want.



# Conclusion

#### Conclusions



- Verilator promotes Faster RTL Development
  - Verilog is standard language, and what the downstream tools want.
  - Waveforms "look" the same across RTL or SystemC, no new tools.
- And faster Verification Development
  - Trivial integration with C++ modeling.
  - Every component can each be either behavioral or RTL code.
  - C++ hooks can be added to the Verilog
  - Automatic coverage analysis
- License-Free runtime is good
  - Easy to run on laptops or SW developer machines.
  - Run as fast as major simulators.
  - \$\$ we would spend on simulator runtime licenses may go to computes.



#### **Available from Veripool.com**

- Verilator:
  - GNU Licensed
  - C++ and Perl Based, Windows and Linux
  - http://www.veripool.com
- Also free on my site:
  - Dinotrace Waveform viewer w/Emacs annotation.
  - Make::Cache Object caching for faster compiles.
  - Schedule::Load Load balancing (ala LSF).
  - Verilog-Mode /\*AUTO...\*/ expansion, highlighting.
  - Verilog-Perl Verilog Perl preprocessor and signal renaming.
  - Voneline Reformat gate netlists for easy grep and editing.
  - Vpm Add simple assertions to any Verilog simulator.
  - Vregs Extract register and class declarations from documentation.
  - Vrename Rename signals across many files (incl SystemC files).



