

## Verilator: Speedy Reference Models, Direct from RTL

http://www.veripool.org/papers

#### Wilson Snyder

Veripool.org wsnyder@wsnyder.org



#### Agenda

- Modeling Hardware
- Intro to Verilator
- Verilator Internals
- Futures
- Conclusion
- Q & A



# **Modeling Hardware**

#### Hardware/Software Co-design

- Products are a mix of hardware and software
- Therefore, both need to be designed in parallel
- What abstraction of HW can SW develop on?
- Ideal:
  - Accurate as real hardware
  - Fast as real hardware
  - Minimal effort to develop model
  - Free runtime i.e. develop SW anywhere







#### **Levels of Abstraction vs. Performance**





## Intro to Verilator



#### **Verilator is a Compiler**

- Verilator compiles synthesizable Verilog into C++
  - Matches synthesis rules, not simulation rules
  - Time delays ignored (a <= #{n} b;)</p>
  - Only two state simulation (and tri-state busses)
  - Unknowns are randomized (better than Xs)
- Creates C++/SystemC wrapper
- Creates own internal interconnect
  - Plays several tricks to get good, fast code



#### **Example: Getting Started**

- (From <a href="https://www.veripool.org/projects/verilator/wiki/Installing">https://www.veripool.org/projects/verilator/wiki/Installing</a>)
- Install RPM

```
apt-get install verilator
```

• Or, the latest from sources

```
(once) git clone <u>http://git.veripool.org/git/verilator</u>
    cd verilator
    git pull
    autoconf
    ./configure
    make
    make install
```

• Read docs and example verilator --help



#### **Example: First Module**

| Convert.v |                                       |
|-----------|---------------------------------------|
|           | module Convert;                       |
|           | input clk                             |
|           | input [31:0] data;                    |
|           | output [31:0] out;                    |
|           |                                       |
|           | initial \$display("Hello flip-flop"); |
|           | always_ff @ (posedge clk)             |
|           | out <= data;                          |
|           | endmodule                             |

• Lint check your code verilator -lint-only -Wall Convert.v



#### **Example: Translation**

 Translate to a C++ class (also can do SystemC module - not shown) verilator -cc Convert.v



- Output in obj\_dir/
- Verilog top module became a C++ class (obj\_dir/VConvert.h)
- Inputs and outputs map directly to bool, uint32\_t, or array of uint32\_t's

# VeriPoo

#### **Example: Calling the model**

- Write .cpp file to call the Verilated class
  - Verilator doesn't make time pass!
    - The key difference from verification-style simulators





#### **Example: Compile and run**

- Compile with e.g. GCC
   make -C obj\_dir -f VConvert.mk VConvert
- Run

```
obj_dir/Vconvert
```

Hello flip-flop

- Other examples
  - 1. verilator --help
  - 2. "examples/" directory in the kit and installed files



## **Verilator Internals**



#### **Verilator Optimizations**



• End result is extremely fast Verilog simulation



#### AstNode

#### The core internal structure is an AstNode





## **Futures**



## **Future Language Enhancements**

- Generally, new features are added as requested, unless difficult ©
  - Unpacked Structs, Classes and methods
  - Dynamic memory, new/delete
  - Event loop, fork/join
  - Someday, full UVM support?
- Lint
  - Improve Verilog code quality checks to aid designers in finding bugs



## **Future Single-Threaded Performance**

- Bit-splitting to avoid UNOPTFLAT
  - Bits in a vector that are always used separately should be separate signal
- Better merging of logic into parallel arrays

```
wire a0 = b0 | c0; wire a1 = b1 | c1;
-> wire a[1:0] = b[1:0] | c[1:0];
```

• Better icache packing by building up subroutines, structs and loops

mod1.foo = some\_long\_equation (mod1.bar)
mod0 foo = some\_long\_equation (mod0.barl)

mod0.foo = some\_long\_equation (mod0.bar])

-> for (i=0; i<2; ++i) mod[i].foo = function(mod[i].bar)

• Likewise dcache packing – Lots of room for research!

## **Multithreaded Performance – The Easy Way**

 Manually instantiate multiple Verilated models and user's wrapper threads them

```
top.cpp (PSEUDO CODE)
#include "Vsub1.h"
#include "Vsub2.h"
void thread1 {
   Vsub1* top1p = new Vsub1();
   barrier(); // Wait for thread1
   for (...) {
      . . .
      topp->eval();
      barrier(); // Align threads
   }
// thread 2 similar, using Vsub2
int main() {
   threads create(thread1,thread2);
   wait(thread1,thread2);
  Verilator Reference Modeling 2017-10
```







#### **Multithreaded Performance – Partitioning**

- True Automatic Model Partitioning
  - Decompose circuit scheduling graph into partitions
  - Schedule the partitions separately ("trains")
  - Dynamically schedule trains on threads
  - Should look to users just like single-threaded
- Automatic parallelism of the whole socket!
  - ThunderX2 -> 32 cores, 128 threads, per socket!
  - Under best case can get superscalar performance if fits in caches
  - Great PhD thesis









# Conclusion



## You Can Help (1 of 2)

- Firstly, need more testcases!
  - Many enhancements are gated by testing and debugging
- Large standalone test cases
  - Need a large testchips and simple testbenches.
  - ★– Port a large SoCs
    - Add a tracing and cycle-by-cycle shadow simulation mode, so finding introduced bugs is greatly simplified?



#### You Can Help (2 of 2)

- Run gprof/oprofile and fix bottlenecks
  - Most optimizations came from "oh, this could be faster"
- Tell us what changes you'd like to see
  - We don't hear from most users, and have no idea what they find frustrating.
- Advocate.
- Of course, patches and co-authors always wanted!



## **Contributing Back**

- The value of Open Source is in the Community!
- Use Forums
- Use Bug Reporting
  - Even if to say what changes you'd like to see
- Try to submit a patch yourself
  - Many problems take only a few hours to resolve yourself; often less time than packaging up a test case for an EDA company!
  - Even if just documentation fixes!
  - Great experience for the resume!
- Advocate



#### Conclusions

- Adopt Verilator
  - Supported
    - Continual language improvements
    - Growing support network for 20+ years
    - Run faster than major simulators
  - Open Source Helps You
    - Easy to run on laptops or SW developer machines
    - Get bug fixes in minutes rather than months
    - Greatly aids commercial license negotiation
  - Keep your Commercial Simulators
    - SystemVerilog Verification, analog models, gate SDF, etc.





## **Verilog-Mode for Emacs**

- Thousands of users, including most IP houses
- Fewer lines of code to edit means fewer bugs
- Indents code correctly, too
- Not a preprocessor, code is always "valid" Verilog
- Automatically injectable into older code.

```
...
/*AUTOLOGIC*/
a a (/*AUTOINST*/);
GNU Emacs (Verilog-Mode)
```





#### Sources

- Verilator and open source design tools at <u>http://www.veripool.org</u>
  - Downloads
  - Bug Reporting
  - User Forums
  - News & Mailing Lists
  - These slides at <a href="http://www.veripool.org/papers/">http://www.veripool.org/papers/</a>

