Major Tools
Other Tools
General Info

Issue #1316

Linear searches exposed by large design (v4.0)

Added by John Coiner over 1 year ago. Updated about 1 year ago.

% Done:



I'm running a large cpu core design through verilator, and have found a few linear searches that are adding a few hours to the verilation time.

1) In V3Active.cpp, there's a linear search for a matching SenList, below a comment about it being "not the fastest" :)

This build spent about 30 minutes in the V3Active pass. During this window, when I ctrl-C'ed into gdb, it was always doing one of these linear searches. (I like to send random CTRL-C's into gdb while the thing is running. Not super precise, but you find out what it's doing. It's the "Fermi profiler"...)

Why do we have so many unique SenLists? That I don't know, and it's a good question. Maybe there's a Good Reason like Extensive Clock Gating, though it seems just as likely to be a bug, some kind of weird interaction between verilator and a particular coding style. In any case, we can still fix the linear search.

2) In V3Delayed.cpp, the createActivePost() routine calls addNext() for each active it creates. This addNext() scans from the middle of a linked-list to find the end, so you get an N^2 runtime if you are creating N actives.

3) We have another linear search in SenTreeFinder::getSenTree() which looks exactly like (1) except it's another copy of almost the same code.

For the SenTree searches, I'm thinking that we can have AstSenTree support a sameHash() routine, and use a V3Hashed to reduce the search to SenTree's with the same hash.

For (2) it looks like we can simply start the search from near the end of the list, where we inserted the last active, instead of from the middle of the list.

diffs (10.3 KB) John Coiner, 06/05/2018 09:00 PM

diffs (25.5 KB) John Coiner, 06/08/2018 10:30 PM

unordered_set_map.patch View (16.6 KB) John Coiner, 06/11/2018 05:27 PM

linear_search.patch View (9.23 KB) John Coiner, 06/11/2018 05:27 PM


#1 Updated by John Coiner over 1 year ago

This patch fixes the linear searches mentioned above. It should apply to develop-v4.

This removes something like 5 hours from the verilator runtime for this particular design, it falls from ~6 hours to ~1 hour.

This may be controversial, I decided to #ifdef on C++11 to test whether the real std::unordered_map is available, and fall back to linear search when it's not.

Why not V3Hashed? For one, the user4 slot is taken in some of the places we need the SenTreeFinder. For two, I found the V3Hashed API a bit confusing and felt more comfortable with the C++ standard type.

The C++11 ifdef can go away when Verilator starts to require C++11 -- probably not too many years off now?

#2 Updated by Wilson Snyder over 1 year ago

  • Category set to Performance
  • Status changed from New to Assigned

Good work.

diff --git a/src/V3Hashed.h b/src/V3Hashed.h
+// Hash of the nodep tree, without caching in user4.
+V3Hash uncachedHash(AstNode* nodep);

Think this is better as a static member, so call as V3Hash::uncachedHash.

diff --git a/src/V3SenTree.h b/src/V3SenTree.h
+#if cplusplus >= 201103L || defined(__GXX_EXPERIMENTAL_CXX0X)

I don't really want to maintain two versions. I don't understand why you can't use VL_UNORDERED_SET, which is just a less efficient sorted set if not using C++11.

>    typedef AstSenTree* Key;
+        bool operator() (const Key& ap, const Key& bp) const {

Think it's more obvious to remove this typedef (below too). e.g.

bool operator() (const AstSenTree* ap, const SenTree* bp) const {
diff --git a/src/V3Delayed.cpp b/src/V3Delayed.cpp

Since it seems separable, feel free to push the V3Delayed fix by itself to master.

#3 Updated by John Coiner over 1 year ago

Hi Wilson,

Regarding '#if cplusplus ...':

std::unordered_set supports user-defined 'hash' and 'equalTo' template parameters, and this patch relies on these. In older compilers, vl_unordered_set expands to std::set which does not support the same template parameters so it's not a substitute (std::set takes a lessthan functor instead.)

One option to avoid the ifdef here: we could make vl_unordered_set expand to a clone of std::unordered_set on pre-C++11 compilers. I have a hashtable implementation lying around that I could press into service as such. It's my own, released under GPL for school projects decades ago, ported to C++ more recently. This would also speed up other vl_unordered_set instances on older compilers.

Another option is to switch V3SenTree.h to use a std::multiset, and write a lessthan for it using the uncachedHash. Then linear search over the equal-hashing elements calling sameTree() on each to get an exact match. That's less efficient, it'll cause more calls to uncachedHash() than the hashtable makes, by a factor of log(N) I think. On the upside it's a local change and does without the ifdef. On the downside, there's an impedance mismatch here -- creating a lessthan routine based on hash values is kinda goofy -- and this doesn't automatically get better when we switch to C++11 globally.

My preference, if you're down with this, would be to add a vl_unordered_set implementation for pre-C++11 that clones the std::unordered_set API. So vl_unordered_set is always a hashtable, regardless of the C++ standard version, and clients can specify custom hash/equalTo functors. When verilator switches to C++11 globally, the vl_unordered_set implementation goes away.

What do you think?

Regarding the 'Key' typedefs, you're right. I thought that maybe C++ would require a const-reference-to-a-pointer-to-a-const-AstSenTree for these functors but it doesn't. I'll change this.

I pushed the V3Delayed.cpp change to master.



#4 Updated by Wilson Snyder over 1 year ago

My preference, if you're down with this, would be to add a vl_unordered_set implementation for pre-C++11

Perfect, I agree. My ideal is not to have the "normal" functional classes have OS or Compiler #ifdefs, and hide that code in wrappers, and this would do that.

#5 Updated by John Coiner over 1 year ago

New patch is attached, this adds vl_unordered_set and vl_unordered_map.

Wilson, please take a look. The patch should apply to develop-v4. The V3Delayed.cpp change is not here, as it's already applied to master.


#6 Updated by Wilson Snyder over 1 year ago

Generally well done, just little stuff.

This is getting larger, so please split into two patches/commits, the first to make the unordered set/map, the second to use them to accelerate.

First set related to unordered:

I think there's some git_untabify's missing, I see tabs you're adding.

--- a/include/verilatedos.h
+++ b/include/verilatedos.h
-# define vl_unordered_map std::map
-# define vl_unordered_set std::set

verilatedos.h is for things that work in both include/ and src/, your about to break any include/ usages. That's ok since there's no uses of unordered map/set in the include directory but thus move the VL_ definition things into (I guess) V3Os.h.

--- a/include/verilatedos.h
+++ b/include/verilatedos.h

VL_HAS_UNORDERED_SET_MAP then #ifndef where you use this.

--- /dev/null
+++ b/src/V3UnorderedSetMap.cpp

Do we need to have a .cpp file? This seems small enough, just move to .h and let compiler decide? (This may also let it hash constants.)

Add standard copyright header (also .h file).

+size_t hash_bytes(const void* vbufp, size_t nbytes) {
+    const vluint8_t* bufp = static_cast<const vluint8_t*>(vbufp);
+    size_t hash = 0;
+    for (unsigned i = 0; i < nbytes; i++) {

Needs to be size_t not unsigned as on some systems unsigned is smaller.

+template <> bool vl_equal_to&lt;std::string&gt;::operator() (const std::string& a,
+                                                   const std::string& b) const {
+    // don't scan the strings if the sizes are different

Capitalize Don't and start of any other comments

--- /dev/null
+++ b/src/V3UnorderedSetMap.h

Add standard header.

Generally does the C++11 version calculate the key hashes again when rehashing (increasing buckets), or does it keep the computed hash with the value pairs? I worry this might get slower than map() otherwise.

@ -0,0 +1,436 @
+// The main event: our vl_unordered_set and vl_unordered_map types.
+// TODO: In the future, when verilator requires C++11 to compile,
+//       remove this entire file. Instead, use the std:: types.

verilator -> Verilator

+    // MEMBERS
+    size_type _numElements;    // number of entries present
+    size_type _log2Buckets;    // log-base-2 of the number of buckets
+    mutable Bucket *_buckets;  // hash table buckets. May be NULL;


+    //                         // we'll allocate it on the fly when
+    //                         // the first entries are created.
+    Bucket m_emptyBucket;      // a fake bucket, used to construct end()
+    Hash _hash;                // hash function provider
+    Equal _equal;              // equal-to function provider

Add m_ to all members, capitalize start of comments, two spaces after comments (or more if group comment as above).

+        bool operator== (const iterator& other) const {
+            return (m_bucketIdx  other.m_bucketIdx)
+                && (m_bit  other.m_bit);

Add parens on any return with non-simple argument so indents right. e.g.:

return ((m_bucketIdx  other.m_bucketIdx)
                && (m_bit  other.m_bit));
+            return !this->operator==(other);


+        void operator++ () {


+    vl_unordered_set()
+        :_numElements(0)
+        ,_log2Buckets(4)
+        ,_buckets(NULL) { }

Space after : and ,

+    ~vl_unordered_set() {
+        if (_buckets)
+            delete [] _buckets;

{} on multiline if's.

+    // METHODS
+    iterator begin() {
+    }
+    const_iterator begin() const {

I'd probably squash the newlines between small methods (without internal newlines), but either way.

+    const_iterator end() const {
+        return iterator(0xFFFFFFFFFFFFFFFFULL,

VL_ULL(0xFFFFFFFFFFFFFFFF) or MSVC will throw a fit.

+            if(_equal.operator()(*it, key)) {

Space after if (other places too, grep for "if(" and "for(".

+        if(NeedToRehash()) {
+            Rehash();

Lower case these function names.

+            for (unsigned i = 0; i < numBuckets(); i++) {


+    // numBuckets() and getBucket() are only public so that
+    // vl_unordered_map can see them. Do not call from outside
+    // this file.

make them protected/friend?

+    void Rehash() {
+        size_type new_log2Buckets = _log2Buckets << 2;
+        size_type new_num_buckets = 1<&lt;new_log2Buckets;

VL_ULL(1) << new_log2Buckets

+        Bucket *new_buckets = new Bucket [new_num_buckets];


+    ~vl_unordered_set() {
+        if (_buckets)
+            delete [] _buckets;
+    }

delete [] m_bucketsp; VL_DANGLING(m_bucketsp);

Perhaps elsewhere as forgot to check until now.

#7 Updated by John Coiner over 1 year ago

Thanks! Sorry, fixing the naming just slipped my mind.

Regarding keeping the hashed value vs. calling hash() during a rehash. The cost of calling hash again is pretty low...

Each rehash grows the bucket-set by a factor of four. If we make a hashtable and insert N entries, then all entries have been hashed at least once; N/4 have been hashed at least twice; N/16 have been hashed at least thrice, and so on. So we get a geometric series of 1*N + (1/4)*N + (1/16)*N... which expands to N*(4/3) hash operations in total. The average node has only been hashed 1.33 times across all inserts and rehashes. One was on the first insert, the other 0.33 were on a rehash.

Okay... so I cheated a bit and picked a value of N where we're juuust about to rehash but haven't quite yet. And if we add one more element, they all get hashed again and now we've called hash() 2.33 times per entry, amortized. That's the worst case.

The cost of keeping the hash value is also low, I guess, so it's tough to evaluate the cost/benefit.

#8 Updated by John Coiner over 1 year ago

Here's the patch, split in two, both should apply to develop-v4. The linear_search.patch depends on unordered_set_map.patch.


#9 Updated by Wilson Snyder over 1 year ago

unordered notes:

+// Specializations of 'vl_hash' and 'vl_equal_to'.
+inline size_t hash_bytes(const void* vbufp, size_t nbytes) {

Rename to "vl_hash_bytes".

+// The main event: our vl_unordered_set and vl_unordered_map types.
+// TODO: In the future, when Verilator requires C++11 to compile,
+//       remove this entire file. Instead, use the std:: types.
+// These classes are clones of the std:: unordered set and map types
+// (hash tables.) They are here so that Verilator can use hash tables
+// in pre-C++11 compilers, and the same client code can link against the
+// std:: types when they are available.
+// The implementations in this file do not implement the complete APIs
+// of the std:: types. Nor are they correct in every detail,
+// notably, the const_iterators do not enforce constness. We can extend
+// these implementations to cover more of the std API as needed.
+template &lt;class Key,
+          class Hash = vl_hash&lt;Key&gt;,
+          class Equal = vl_equal_to&lt;Key&gt; > class vl_unordered_set {

Change to:

/// Functional clone of the std::unordered_set (hash tables.)
template <class Key,
          class Hash = vl_hash<Key>,
          class Equal = vl_equal_to<Key> > class vl_unordered_set {

Then move the other comments to the file's header or a little below that as you feel appropriate.

+    ~vl_unordered_set() {
+        delete [] m_bucketsp;
+        VL_DANGLING(m_bucketsp);

Please put the delete and VL_DANGLING on the same line (because I grep for lines with delete and no VL_DANGLING).

+    bool empty() const {
+        return m_numElements == 0;
+    }
+    size_type size() const {
+        return m_numElements;
+    }

Move next to each other and make one-liners.

+template &lt;class Key,
+          class Value,
+          class Hash = vl_hash&lt;Key&gt;,
+          class Equal = vl_equal_to&lt;Key&gt; > class vl_unordered_map {

Add just before that template this:

/// Functional clone of the std::unordered_set (hash tables.)
template <class Key,
   +    typedef vluint64_t size_type;
   +    typedef std::pair<Key, Value> value_type;

Add "// TYPES" before these

With these changes you can push to develop-v4.

#10 Updated by Wilson Snyder over 1 year ago

Linear notes:

+class SenTreeSet {

Add "// TYPES" here

+    // Hash table of sensitive blocks.
+    class HashSenTree {
+    public:
+        HashSenTree() {}
+        size_t operator() (const AstSenTree* kp) const {
+            return V3Hashed::uncachedHash(kp).fullValue();
+        }
+    private:
+        VL_UNCOPYABLE(HashSenTree);
+    };
+    class EqSenTree {
+    public:
+        EqSenTree() {}
+        bool operator() (const AstSenTree* ap, const AstSenTree* bp) const {
+            return ap->sameTree(bp);
+        }
+    private:
+        VL_UNCOPYABLE(EqSenTree);
+    };

Add "// MEMBERS" here

+    typedef vl_unordered_set&lt;AstSenTree*, HashSenTree, EqSenTree&gt; Set;
+    Set m_trees;  // Set of sensitive blocks, for folding.

Add "// METHODS" here

+    void add(AstSenTree* nodep) { m_trees.insert(nodep); }
+    AstSenTree* find(AstSenTree* likep) {
+        AstSenTree* resultp = NULL;
+        Set::iterator it = m_trees.find(likep);
+        if (it != m_trees.end()) {
+            resultp = *it;
+        }
+        return resultp;
+    }
+    void clear() { m_trees.clear(); }
+    SenTreeSet() {}
+    VL_UNCOPYABLE(SenTreeSet);

Also add a Changes line:

  • Fix Verilation performance issues, bug1316. [John Coiner]

and (of course after the unordered changes) you can push to develop-v4 and mark this resolved.

#11 Updated by John Coiner over 1 year ago

Pushed to develop-v4. Thanks for the fast review.

#12 Updated by John Coiner over 1 year ago

I don't have a UI to resolve this issue, maybe my redmine account is missing a permission to do that? In any event, this one should be ready to resolve.

#13 Updated by Wilson Snyder over 1 year ago

  • Status changed from Assigned to Resolved

Fixed in git develop-v4 towards 4.000.

#14 Updated by Wilson Snyder over 1 year ago

  • Subject changed from Linear searches exposed by large design to Linear searches exposed by large design (v4.0)

#15 Updated by Wilson Snyder about 1 year ago

  • Status changed from Resolved to Closed

In 4.002, thanks for your work.

Also available in: Atom