New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V3Inline can create unboundedly large modules #1223
Comments
Original Redmine Comment
That was how it was intended to work. It certainly does already iterate child-up. There certainly could be an error in computing the weight of inlining, I welcome a patch. |
Original Redmine Comment Attached is a patch for V3Inline.cpp. It splits the old InlineMarkVisitor into two passes:
There are also a few fixes to get the regression stable under Ubuntu 17.10 which has a new GCC and a new Perl:
|
Original Redmine Comment I pushed the compiler fixes, thanks for that. BTW its easier to track if you file a separate bug as that's a separate problem. As to V3Inline I'm ok with your algorithm. However each additional Visitor adds time to the program, it wasn't obvious to me why a new stage was needed versus it all being calculated in the existing InlineMarks - that is calculate all of the statistics related to each module, then have a final loop on each module (see e.g. for loop in V3Dead.cpp line 281) which does the inline or not calculation. (Each additional visitor can add a minute to a huge design.) |
Original Redmine Comment Wilson, here's a new V3Inline patch that does not add a new pass as you suggest. It's shorter too. Also here's a bonus patch that makes t_dist_whitespace give a line number for the failure; I needed this to debug. |
Original Redmine Comment Pushed the line fix, thanks. Inlining is nicely done. Just a few nits, if you can please clean these then I'll merge.
|
Original Redmine Comment Thanks. This patch applies atop the previous one for V3Inline. |
Original Redmine Comment Great, I made a few trivial additional style edits not worth complaining about. Pushed to git towards 3.913. |
Original Redmine Comment In 3.914. |
Author Name: John Coiner (@jcoiner)
Original Redmine Issue: 1223 from https://www.veripool.org
Original Assignee: John Coiner (@jcoiner)
Scenario:
Module TOP instances two of module B.
Module B instances two of module C.
(and so on until...)
Module I instances two of module J.
Module J instances two of module K, which is finally a leaf node.
There are 1024 instances of K in this design, grand total.
V3Inline may flatten this into one single giant module. For all mods B through K, we have only two "refs" in the accounting done by InlineMarkVisitor::visit(AstNodeModule* nodep), and "m_stmtCnt" doesn't take into account the growth in statements resulting from inlining. So we generate a single module with at least 1024*K statements. Generated .cpp files can run into the megabytes for a small but steeply-nested design like this.
I understand that inlining is great for lots of reasons: it lets us elide assignments at module boundaries, it minimizes function call overhead due to functions becoming larger, and creates more opportunities for the C compiler to reorder operations (promoting LOADs, etc) also due to functions becoming larger. Probably other reasons.
My guess is that most of the benefits of inlining happen when smaller modules are inlined. Once we are inlining larger modules, my guess is that beyond a certain point, the costs begin to exceed the benefits. The instruction stream grows sharply from inlining very large modules, and in the extreme case above, we lose the ability to V3Combine anything. So I would predict a negative ICache impact. Also verilation time and compile time get bad when functions grow very large.
Here's a proposal:
By the time we get a few levels up from K, the size of the module we're considering inlining will grow above v3Global.opt.inlineMult() and we'll stop inlining at some medium size.
Wilson, does this make sense? If you agree I can work up a patch.
I ran into this while working on multicore support (issue 1216). I was trying to derive the partitioning of the execution graph in part from the scope partitioning. It wasn't working well due to V3Inline flattening everything into only one scope.
The text was updated successfully, but these errors were encountered: