New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear searches exposed by large design (v4.0) #1316
Comments
Original Redmine Comment This patch fixes the linear searches mentioned above. It should apply to develop-v4. This removes something like 5 hours from the verilator runtime for this particular design, it falls from ~6 hours to ~1 hour. This may be controversial, I decided to #ifdef on C++11 to test whether the real std::unordered_map is available, and fall back to linear search when it's not. Why not V3Hashed? For one, the user4 slot is taken in some of the places we need the SenTreeFinder. For two, I found the V3Hashed API a bit confusing and felt more comfortable with the C++ standard type. The C++11 ifdef can go away when Verilator starts to require C++11 -- probably not too many years off now? |
Original Redmine Comment Good work. diff --git a/src/V3Hashed.h b/src/V3Hashed.h Think this is better as a static member, so call as V3Hash::uncachedHash. diff --git a/src/V3SenTree.h b/src/V3SenTree.h I don't really want to maintain two versions. I don't understand why you
Think it's more obvious to remove this typedef (below too). e.g. bool operator() (const AstSenTree* ap, const SenTree* bp) const { diff --git a/src/V3Delayed.cpp b/src/V3Delayed.cpp Since it seems separable, feel free to push the V3Delayed fix by itself to master. |
Original Redmine Comment Hi Wilson, Regarding '#if cplusplus ...': std::unordered_set supports user-defined 'hash' and 'equalTo' template parameters, and this patch relies on these. In older compilers, vl_unordered_set expands to std::set which does not support the same template parameters so it's not a substitute (std::set takes a lessthan functor instead.) One option to avoid the ifdef here: we could make vl_unordered_set expand to a clone of std::unordered_set on pre-C++11 compilers. I have a hashtable implementation lying around that I could press into service as such. It's my own, released under GPL for school projects decades ago, ported to C++ more recently. This would also speed up other vl_unordered_set instances on older compilers. Another option is to switch V3SenTree.h to use a std::multiset, and write a lessthan for it using the uncachedHash. Then linear search over the equal-hashing elements calling sameTree() on each to get an exact match. That's less efficient, it'll cause more calls to uncachedHash() than the hashtable makes, by a factor of log(N) I think. On the upside it's a local change and does without the ifdef. On the downside, there's an impedance mismatch here -- creating a lessthan routine based on hash values is kinda goofy -- and this doesn't automatically get better when we switch to C++11 globally. My preference, if you're down with this, would be to add a vl_unordered_set implementation for pre-C++11 that clones the std::unordered_set API. So vl_unordered_set is always a hashtable, regardless of the C++ standard version, and clients can specify custom hash/equalTo functors. When verilator switches to C++11 globally, the vl_unordered_set implementation goes away. What do you think? Regarding the 'Key' typedefs, you're right. I thought that maybe C++ would require a const-reference-to-a-pointer-to-a-const-AstSenTree for these functors but it doesn't. I'll change this. I pushed the V3Delayed.cpp change to master. Thanks, John |
Original Redmine Comment
Perfect, I agree. My ideal is not to have the "normal" functional classes have OS or Compiler #ifdefs, and hide that code in wrappers, and this would do that. |
Original Redmine Comment New patch is attached, this adds vl_unordered_set and vl_unordered_map. Wilson, please take a look. The patch should apply to develop-v4. The V3Delayed.cpp change is not here, as it's already applied to master. Thanks |
Original Redmine Comment Generally well done, just little stuff. This is getting larger, so please split into two patches/commits, the first First set related to unordered: I think there's some git_untabify's missing, I see tabs you're adding.
verilatedos.h is for things that work in both include/ and src/, your about
VL_HAS_UNORDERED_SET_MAP then #ifndef where you use this.
Do we need to have a .cpp file? This seems small enough, just move to Add standard copyright header (also .h file).
Needs to be size_t not unsigned as on some systems unsigned is smaller.
Capitalize Don't and start of any other comments
Add standard header. Generally does the C++11 version calculate the key hashes again when
verilator -> Verilator
m_bucketsp
Add m_ to all members, capitalize start of comments, two spaces after
Add parens on any return with non-simple argument so indents right.
Ditto
operator++()
Space after : and ,
{} on multiline if's.
I'd probably squash the newlines between small methods (without internal
VL_ULL(0xFFFFFFFFFFFFFFFF) or MSVC will throw a fit.
Space after if (other places too, grep for "if(" and "for(".
Lower case these function names.
size_type
make them protected/friend?
VL_ULL(1) << new_log2Buckets
new_bucketsp
delete [] m_bucketsp; VL_DANGLING(m_bucketsp); Perhaps elsewhere as forgot to check until now. |
Original Redmine Comment Thanks! Sorry, fixing the naming just slipped my mind. Regarding keeping the hashed value vs. calling hash() during a rehash. The cost of calling hash again is pretty low... Each rehash grows the bucket-set by a factor of four. If we make a hashtable and insert N entries, then all entries have been hashed at least once; N/4 have been hashed at least twice; N/16 have been hashed at least thrice, and so on. So we get a geometric series of 1*N + (1/4)*N + (1/16)N... which expands to N(4/3) hash operations in total. The average node has only been hashed 1.33 times across all inserts and rehashes. One was on the first insert, the other 0.33 were on a rehash. Okay... so I cheated a bit and picked a value of N where we're juuust about to rehash but haven't quite yet. And if we add one more element, they all get hashed again and now we've called hash() 2.33 times per entry, amortized. That's the worst case. The cost of keeping the hash value is also low, I guess, so it's tough to evaluate the cost/benefit. |
Original Redmine Comment Here's the patch, split in two, both should apply to develop-v4. The linear_search.patch depends on unordered_set_map.patch. Thanks |
Original Redmine Comment unordered notes:
Rename to "vl_hash_bytes".
Change to:
Then move the other comments to the file's header or a little below that as
Please put the delete and VL_DANGLING on the same line (because I grep for
Move next to each other and make one-liners.
Add just before that template this:
Add "// TYPES" before these With these changes you can push to develop-v4. |
Original Redmine Comment Linear notes:
Add "// TYPES" here
Add "// MEMBERS" here
Add "// METHODS" here
Also add a Changes line:
and (of course after the unordered changes) you can push to develop-v4 and mark this resolved. |
Original Redmine Comment Pushed to develop-v4. Thanks for the fast review. |
Original Redmine Comment I don't have a UI to resolve this issue, maybe my redmine account is missing a permission to do that? In any event, this one should be ready to resolve. |
Original Redmine Comment Fixed in git develop-v4 towards 4.000. |
Original Redmine Comment In 4.002, thanks for your work. |
Author Name: John Coiner (@jcoiner)
Original Redmine Issue: 1316 from https://www.veripool.org
Original Assignee: John Coiner (@jcoiner)
I'm running a large cpu core design through verilator, and have found a few linear searches that are adding a few hours to the verilation time.
This build spent about 30 minutes in the V3Active pass. During this window, when I ctrl-C'ed into gdb, it was always doing one of these linear searches. (I like to send random CTRL-C's into gdb while the thing is running. Not super precise, but you find out what it's doing. It's the "Fermi profiler"...)
Why do we have so many unique SenLists? That I don't know, and it's a good question. Maybe there's a Good Reason like Extensive Clock Gating, though it seems just as likely to be a bug, some kind of weird interaction between verilator and a particular coding style. In any case, we can still fix the linear search.
In V3Delayed.cpp, the createActivePost() routine calls addNext() for each active it creates. This addNext() scans from the middle of a linked-list to find the end, so you get an N^2 runtime if you are creating N actives.
We have another linear search in SenTreeFinder::getSenTree() which looks exactly like (1) except it's another copy of almost the same code.
For the SenTree searches, I'm thinking that we can have AstSenTree support a sameHash() routine, and use a V3Hashed to reduce the search to SenTree's with the same hash.
For (2) it looks like we can simply start the search from near the end of the list, where we inserted the last active, instead of from the middle of the list.
The text was updated successfully, but these errors were encountered: