Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear searches exposed by large design (v4.0) #1316

Closed
veripoolbot opened this issue Jun 5, 2018 · 14 comments
Closed

Linear searches exposed by large design (v4.0) #1316

veripoolbot opened this issue Jun 5, 2018 · 14 comments
Assignees
Labels
area: performance Issue involves performance issues resolution: fixed Closed; fixed

Comments

@veripoolbot
Copy link
Contributor


Author Name: John Coiner (@jcoiner)
Original Redmine Issue: 1316 from https://www.veripool.org

Original Assignee: John Coiner (@jcoiner)


I'm running a large cpu core design through verilator, and have found a few linear searches that are adding a few hours to the verilation time.

  1. In V3Active.cpp, there's a linear search for a matching SenList, below a comment about it being "not the fastest" :)

This build spent about 30 minutes in the V3Active pass. During this window, when I ctrl-C'ed into gdb, it was always doing one of these linear searches. (I like to send random CTRL-C's into gdb while the thing is running. Not super precise, but you find out what it's doing. It's the "Fermi profiler"...)

Why do we have so many unique SenLists? That I don't know, and it's a good question. Maybe there's a Good Reason like Extensive Clock Gating, though it seems just as likely to be a bug, some kind of weird interaction between verilator and a particular coding style. In any case, we can still fix the linear search.

  1. In V3Delayed.cpp, the createActivePost() routine calls addNext() for each active it creates. This addNext() scans from the middle of a linked-list to find the end, so you get an N^2 runtime if you are creating N actives.

  2. We have another linear search in SenTreeFinder::getSenTree() which looks exactly like (1) except it's another copy of almost the same code.

For the SenTree searches, I'm thinking that we can have AstSenTree support a sameHash() routine, and use a V3Hashed to reduce the search to SenTree's with the same hash.

For (2) it looks like we can simply start the search from near the end of the list, where we inserted the last active, instead of from the middle of the list.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-05T21:02:30Z


This patch fixes the linear searches mentioned above. It should apply to develop-v4.

This removes something like 5 hours from the verilator runtime for this particular design, it falls from ~6 hours to ~1 hour.

This may be controversial, I decided to #ifdef on C++11 to test whether the real std::unordered_map is available, and fall back to linear search when it's not.

Why not V3Hashed? For one, the user4 slot is taken in some of the places we need the SenTreeFinder. For two, I found the V3Hashed API a bit confusing and felt more comfortable with the C++ standard type.

The C++11 ifdef can go away when Verilator starts to require C++11 -- probably not too many years off now?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-07T01:21:30Z


Good work.

diff --git a/src/V3Hashed.h b/src/V3Hashed.h
+// Hash of the nodep tree, without caching in user4.
+V3Hash uncachedHash(AstNode* nodep);

Think this is better as a static member, so call as V3Hash::uncachedHash.

diff --git a/src/V3SenTree.h b/src/V3SenTree.h
+#if __cplusplus >= 201103L || defined(GXX_EXPERIMENTAL_CXX0X)

I don't really want to maintain two versions. I don't understand why you
can't use VL_UNORDERED_SET, which is just a less efficient sorted set if
not using C++11.

typedef AstSenTree* Key;

  •    bool operator() (const Key& ap, const Key& bp) const {
    

Think it's more obvious to remove this typedef (below too). e.g.

bool operator() (const AstSenTree* ap, const SenTree* bp) const {

diff --git a/src/V3Delayed.cpp b/src/V3Delayed.cpp

Since it seems separable, feel free to push the V3Delayed fix by itself to master.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-07T16:58:59Z


Hi Wilson,

Regarding '#if cplusplus ...':

std::unordered_set supports user-defined 'hash' and 'equalTo' template parameters, and this patch relies on these. In older compilers, vl_unordered_set expands to std::set which does not support the same template parameters so it's not a substitute (std::set takes a lessthan functor instead.)

One option to avoid the ifdef here: we could make vl_unordered_set expand to a clone of std::unordered_set on pre-C++11 compilers. I have a hashtable implementation lying around that I could press into service as such. It's my own, released under GPL for school projects decades ago, ported to C++ more recently. This would also speed up other vl_unordered_set instances on older compilers.

Another option is to switch V3SenTree.h to use a std::multiset, and write a lessthan for it using the uncachedHash. Then linear search over the equal-hashing elements calling sameTree() on each to get an exact match. That's less efficient, it'll cause more calls to uncachedHash() than the hashtable makes, by a factor of log(N) I think. On the upside it's a local change and does without the ifdef. On the downside, there's an impedance mismatch here -- creating a lessthan routine based on hash values is kinda goofy -- and this doesn't automatically get better when we switch to C++11 globally.

My preference, if you're down with this, would be to add a vl_unordered_set implementation for pre-C++11 that clones the std::unordered_set API. So vl_unordered_set is always a hashtable, regardless of the C++ standard version, and clients can specify custom hash/equalTo functors. When verilator switches to C++11 globally, the vl_unordered_set implementation goes away.

What do you think?

Regarding the 'Key' typedefs, you're right. I thought that maybe C++ would require a const-reference-to-a-pointer-to-a-const-AstSenTree for these functors but it doesn't. I'll change this.

I pushed the V3Delayed.cpp change to master.

Thanks,

John

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-07T17:50:28Z


My preference, if you're down with this, would be to add a vl_unordered_set implementation for pre-C++11

Perfect, I agree. My ideal is not to have the "normal" functional classes have OS or Compiler #ifdefs, and hide that code in wrappers, and this would do that.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-08T22:31:13Z


New patch is attached, this adds vl_unordered_set and vl_unordered_map.

Wilson, please take a look. The patch should apply to develop-v4. The V3Delayed.cpp change is not here, as it's already applied to master.

Thanks

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-09T18:53:43Z


Generally well done, just little stuff.

This is getting larger, so please split into two patches/commits, the first
to make the unordered set/map, the second to use them to accelerate.

First set related to unordered:

I think there's some git_untabify's missing, I see tabs you're adding.

--- a/include/verilatedos.h
+++ b/include/verilatedos.h

-# define vl_unordered_map std::map
-# define vl_unordered_set std::set
-# define VL_INCLUDE_UNORDERED_MAP <map>
-# define VL_INCLUDE_UNORDERED_SET <set>

verilatedos.h is for things that work in both include/ and src/, your about
to break any include/ usages. That's ok since there's no uses of unordered
map/set in the include directory but thus move the VL_ definition things
into (I guess) V3Os.h.

 --- a/include/verilatedos.h
 +++ b/include/verilatedos.h

 +# define BUILD_V3UNORDEREDSETMAP

VL_HAS_UNORDERED_SET_MAP then #ifndef where you use this.

 --- /dev/null
 +++ b/src/V3UnorderedSetMap.cpp

Do we need to have a .cpp file? This seems small enough, just move to
.h and let compiler decide? (This may also let it hash constants.)

Add standard copyright header (also .h file).

 +size_t hash_bytes(const void* vbufp, size_t nbytes) {
 +    const vluint8_t* bufp = static_cast<const vluint8_t*>(vbufp);
 +    size_t hash = 0;
 +    for (unsigned i = 0; i < nbytes; i++) {

Needs to be size_t not unsigned as on some systems unsigned is smaller.

 +template <> bool vl_equal_to<std::string>::operator() (const std::string& a,
 +                                                   const std::string& b) const {
 +    // don't scan the strings if the sizes are different

Capitalize Don't and start of any other comments

 --- /dev/null
 +++ b/src/V3UnorderedSetMap.h

Add standard header.

Generally does the C++11 version calculate the key hashes again when
rehashing (increasing buckets), or does it keep the computed hash with the
value pairs? I worry this might get slower than map() otherwise.

 @@ -0,0 +1,436 @@
 +// The main event: our vl_unordered_set and vl_unordered_map types.
 +//
 +// TODO: In the future, when verilator requires C++11 to compile,
 +//       remove this entire file. Instead, use the std:: types.

verilator -> Verilator

 +private:
 +    // MEMBERS
 +    size_type _numElements;    // number of entries present
 +    size_type _log2Buckets;    // log-base-2 of the number of buckets
 +    mutable Bucket *_buckets;  // hash table buckets. May be NULL;

m_bucketsp

 +    //                         // we'll allocate it on the fly when
 +    //                         // the first entries are created.
 +    Bucket m_emptyBucket;      // a fake bucket, used to construct end()
 +    Hash _hash;                // hash function provider
 +    Equal _equal;              // equal-to function provider

Add m_ to all members, capitalize start of comments, two spaces after
comments (or more if group comment as above).

 +        bool operator== (const iterator& other) const {
 +            return (m_bucketIdx == other.m_bucketIdx)
 +                && (m_bit == other.m_bit);

Add parens on any return with non-simple argument so indents right.
e.g.:

        return ((m_bucketIdx == other.m_bucketIdx)
                 && (m_bit == other.m_bit));

 +            return !this->operator==(other);

Ditto

 +        void operator++ () {

operator++()

 +    vl_unordered_set()
 +        :_numElements(0)
 +        ,_log2Buckets(4)
 +        ,_buckets(NULL) { }

Space after : and ,

 +    ~vl_unordered_set() {
 +        if (_buckets)
 +            delete [] _buckets;

{} on multiline if's.

 +    // METHODS
 +    iterator begin() {
 +    }
 +
 +    const_iterator begin() const {

I'd probably squash the newlines between small methods (without internal
newlines), but either way.

 +    const_iterator end() const {
 +        return iterator(0xFFFFFFFFFFFFFFFFULL,

VL_ULL(0xFFFFFFFFFFFFFFFF) or MSVC will throw a fit.

 +            if(_equal.operator()(*it, key)) {

Space after if (other places too, grep for "if(" and "for(".

 +        if(NeedToRehash()) {
 +            Rehash();

Lower case these function names.

 +            for (unsigned i = 0; i < numBuckets(); i++) {

size_type

 +    // numBuckets() and getBucket() are only public so that
 +    // vl_unordered_map can see them. Do not call from outside
 +    // this file.

make them protected/friend?

 +    void Rehash() {
 +        size_type new_log2Buckets = _log2Buckets << 2;
 +        size_type new_num_buckets = 1<<new_log2Buckets;

VL_ULL(1) << new_log2Buckets

 +        Bucket *new_buckets = new Bucket [new_num_buckets];

new_bucketsp

 +    ~vl_unordered_set() {
 +        if (_buckets)
 +            delete [] _buckets;
 +    }

delete [] m_bucketsp; VL_DANGLING(m_bucketsp);

Perhaps elsewhere as forgot to check until now.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-11T17:17:24Z


Thanks! Sorry, fixing the naming just slipped my mind.

Regarding keeping the hashed value vs. calling hash() during a rehash. The cost of calling hash again is pretty low...

Each rehash grows the bucket-set by a factor of four. If we make a hashtable and insert N entries, then all entries have been hashed at least once; N/4 have been hashed at least twice; N/16 have been hashed at least thrice, and so on. So we get a geometric series of 1*N + (1/4)*N + (1/16)N... which expands to N(4/3) hash operations in total. The average node has only been hashed 1.33 times across all inserts and rehashes. One was on the first insert, the other 0.33 were on a rehash.

Okay... so I cheated a bit and picked a value of N where we're juuust about to rehash but haven't quite yet. And if we add one more element, they all get hashed again and now we've called hash() 2.33 times per entry, amortized. That's the worst case.

The cost of keeping the hash value is also low, I guess, so it's tough to evaluate the cost/benefit.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-11T17:27:36Z


Here's the patch, split in two, both should apply to develop-v4. The linear_search.patch depends on unordered_set_map.patch.

Thanks

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-11T22:22:43Z


unordered notes:

+// Specializations of 'vl_hash' and 'vl_equal_to'.
+inline size_t hash_bytes(const void* vbufp, size_t nbytes) {

Rename to "vl_hash_bytes".

+// The main event: our vl_unordered_set and vl_unordered_map types.
+//
+// TODO: In the future, when Verilator requires C++11 to compile,
+//       remove this entire file. Instead, use the std:: types.
+//
+// These classes are clones of the std:: unordered set and map types
+// (hash tables.) They are here so that Verilator can use hash tables
+// in pre-C++11 compilers, and the same client code can link against the
+// std:: types when they are available.
+//
+// The implementations in this file do not implement the complete APIs
+// of the std:: types. Nor are they correct in every detail,
+// notably, the const_iterators do not enforce constness. We can extend
+// these implementations to cover more of the std API as needed.
+//
+template <class Key,
+          class Hash = vl_hash<Key>,
+          class Equal = vl_equal_to<Key> > class vl_unordered_set {

Change to:

//===================================================================
//
/// Functional clone of the std::unordered_set (hash tables.)
template <class Key,
           class Hash = vl_hash<Key>,
           class Equal = vl_equal_to<Key> > class vl_unordered_set {

Then move the other comments to the file's header or a little below that as
you feel appropriate.

+    ~vl_unordered_set() {
+        delete [] m_bucketsp;
+        VL_DANGLING(m_bucketsp);

Please put the delete and VL_DANGLING on the same line (because I grep for
lines with delete and no VL_DANGLING).

+    bool empty() const {
+        return m_numElements == 0;
+    }
+    size_type size() const {
+        return m_numElements;
+    }

Move next to each other and make one-liners.

+template <class Key,
+          class Value,
+          class Hash = vl_hash<Key>,
+          class Equal = vl_equal_to<Key> > class vl_unordered_map {

Add just before that template this:

//===================================================================
//
/// Functional clone of the std::unordered_set (hash tables.)
template <class Key,
    ...

    +    typedef vluint64_t size_type;
    +    typedef std::pair<Key, Value> value_type;

Add "// TYPES" before these

With these changes you can push to develop-v4.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-11T22:23:44Z


Linear notes:

+class SenTreeSet {
+public:

Add "// TYPES" here

+    // Hash table of sensitive blocks.
+    class HashSenTree {
+    public:
+        HashSenTree() {}
+        size_t operator() (const AstSenTree* kp) const {
+            return V3Hashed::uncachedHash(kp).fullValue();
+        }
+    private:
+        VL_UNCOPYABLE(HashSenTree);
+    };
+
+    class EqSenTree {
+    public:
+        EqSenTree() {}
+        bool operator() (const AstSenTree* ap, const AstSenTree* bp) const {
+            return ap->sameTree(bp);
+        }
+    private:
+        VL_UNCOPYABLE(EqSenTree);
+    };
+

Add "// MEMBERS" here

+    typedef vl_unordered_set<AstSenTree*, HashSenTree, EqSenTree> Set;
+    Set m_trees;  // Set of sensitive blocks, for folding.

Add "// METHODS" here

+    void add(AstSenTree* nodep) { m_trees.insert(nodep); }
+    AstSenTree* find(AstSenTree* likep) {
+        AstSenTree* resultp = NULL;
+        Set::iterator it = m_trees.find(likep);
+        if (it != m_trees.end()) {
+            resultp = *it;
+        }
+        return resultp;
+    }
+    void clear() { m_trees.clear(); }
+    SenTreeSet() {}
+
+private:
+    VL_UNCOPYABLE(SenTreeSet);
+};
+

Also add a Changes line:

****  Fix Verilation performance issues, #�. [John Coiner]

and (of course after the unordered changes) you can push to develop-v4 and mark this resolved.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-12T02:10:28Z


Pushed to develop-v4. Thanks for the fast review.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2018-06-12T02:20:31Z


I don't have a UI to resolve this issue, maybe my redmine account is missing a permission to do that? In any event, this one should be ready to resolve.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-06-12T08:51:02Z


Fixed in git develop-v4 towards 4.000.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2018-09-16T21:28:06Z


In 4.002, thanks for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance Issue involves performance issues resolution: fixed Closed; fixed
Projects
None yet
Development

No branches or pull requests

2 participants