Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3Inline can create unboundedly large modules #1223

Closed
veripoolbot opened this issue Sep 29, 2017 · 8 comments
Closed

V3Inline can create unboundedly large modules #1223

veripoolbot opened this issue Sep 29, 2017 · 8 comments
Assignees
Labels
area: performance Issue involves performance issues resolution: fixed Closed; fixed

Comments

@veripoolbot
Copy link
Contributor


Author Name: John Coiner (@jcoiner)
Original Redmine Issue: 1223 from https://www.veripool.org

Original Assignee: John Coiner (@jcoiner)


Scenario:

Module TOP instances two of module B.
Module B instances two of module C.
(and so on until...)
Module I instances two of module J.
Module J instances two of module K, which is finally a leaf node.

There are 1024 instances of K in this design, grand total.

V3Inline may flatten this into one single giant module. For all mods B through K, we have only two "refs" in the accounting done by InlineMarkVisitor::visit(AstNodeModule* nodep), and "m_stmtCnt" doesn't take into account the growth in statements resulting from inlining. So we generate a single module with at least 1024*K statements. Generated .cpp files can run into the megabytes for a small but steeply-nested design like this.

I understand that inlining is great for lots of reasons: it lets us elide assignments at module boundaries, it minimizes function call overhead due to functions becoming larger, and creates more opportunities for the C compiler to reorder operations (promoting LOADs, etc) also due to functions becoming larger. Probably other reasons.

My guess is that most of the benefits of inlining happen when smaller modules are inlined. Once we are inlining larger modules, my guess is that beyond a certain point, the costs begin to exceed the benefits. The instruction stream grows sharply from inlining very large modules, and in the extreme case above, we lose the ability to V3Combine anything. So I would predict a negative ICache impact. Also verilation time and compile time get bad when functions grow very large.

Here's a proposal:

  • When deciding which modules to inline, visit modules in bottom-up order (in the instance hierarchy)
  • Let's assume we've decided to inline module K. Now when we make the inlining decision for module J, compute the number of statements in J taking into account both J's native statements and those it inherited from its two inlined copies of K.

By the time we get a few levels up from K, the size of the module we're considering inlining will grow above v3Global.opt.inlineMult() and we'll stop inlining at some medium size.

Wilson, does this make sense? If you agree I can work up a patch.

I ran into this while working on multicore support (issue 1216). I was trying to derive the partitioning of the execution graph in part from the scope partitioning. It wasn't working well due to V3Inline flattening everything into only one scope.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-09-30T14:25:59Z


When deciding which modules to inline, visit modules in bottom-up order (in the instance hierarchy)

  • Let's assume we've decided to inline module K. Now when we make the inlining decision for module J, compute the number of statements in J taking into account both J's native statements and those it inherited from its two inlined copies of K.

That was how it was intended to work. It certainly does already iterate child-up. There certainly could be an error in computing the weight of inlining, I welcome a patch.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-09-30T16:39:32Z


Attached is a patch for V3Inline.cpp. It splits the old InlineMarkVisitor into two passes:

  • InlineStatsVisitor goes top down, counts the number of refs to each module, the number of statements native to each module, and its CIL_ disposition.

  • InlineMarkVisitor goes bottom-up, making a final inline decision for each module, and updating the statement counts to reflect inlined modules (fixing the bug)

There are also a few fixes to get the regression stable under Ubuntu 17.10 which has a new GCC and a new Perl:

  • GCC doesn't like enums being used in bool context, so don't

  • GCC doesn't like operator++ being used on a bool, so don't

  • Perl has gotten more strict about what regexes it will accept, so I had to guard one regex within an eval() to avoid crashing driver.pl.

  • Please ignore the change in t_func_const_struct_bad.pl. I made a change here, committed it, realized I didn't need this change, and committed the reverse change. So there's no net change to the file. For some reason 'git format-patch' still wants to talk about this file in the patch. Whereas 'git diff' behaves like I expect over the same rev range: it understands that there the edits cancel out and it's silent about this file. Sorry. It's literally my first day using git and this part is over my head.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-09-30T17:39:31Z


I pushed the compiler fixes, thanks for that. BTW its easier to track if you file a separate bug as that's a separate problem.

As to V3Inline I'm ok with your algorithm. However each additional Visitor adds time to the program, it wasn't obvious to me why a new stage was needed versus it all being calculated in the existing InlineMarks - that is calculate all of the statistics related to each module, then have a final loop on each module (see e.g. for loop in V3Dead.cpp line 281) which does the inline or not calculation. (Each additional visitor can add a minute to a huge design.)

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-10-01T16:21:02Z


Wilson, here's a new V3Inline patch that does not add a new pass as you suggest. It's shorter too.

Also here's a bonus patch that makes t_dist_whitespace give a line number for the failure; I needed this to debug.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-01T20:26:00Z


Pushed the line fix, thanks.

Inlining is nicely done. Just a few nits, if you can please clean these then I'll merge.

+    vector<AstNodeModule*> m_all_mods;  // All modules, in top-down order.

  typedef vector<AstNodeModule*>  ModVec;
  ModVec m_allMods;


-           } else {
-               m_modp->user1(1);
+           } else if (m_modp->user2() == CIL_MAYBE ||
+                       m_modp->user2() == CIL_NOTSOFT) {
+               m_modp->user2(CIL_USER);

  } else if (m_modp->user2() == CIL_MAYBE
              || m_modp->user2() == CIL_NOTSOFT) {

(I always put operators first on a continuation line as it's easier to read quickly, especially on long lines.)


+        // Iterate through all modules in bottom-up order.
+        // Make a final inlining decision for each.
+        for (int idx = m_all_mods.size() - 1; idx >= 0; idx--) {
+            AstNodeModule* mod = m_all_mods[idx];

  for (ModVec::reverse_iterator it=m_allMods.rbegin(); it!=m_allMods.rend(); ++it)
        AstNodeModule* modp = *it;


+            LocalInstanceMap& locals = m_instances[mod];

    LocalInstanceMap& localsr = m_instances[mod];

(Use trailing r for references.)

+            for (LocalInstanceMap::iterator it = locals.begin();
+                 it != locals.end(); ++it) {
+                AstNodeModule* child = it->first;

    AstNodeModule* childp = it->first;


+                if (child->user1() /* inlining child */) {

    Use // comments please.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-10-01T21:45:38Z


Thanks. This patch applies atop the previous one for V3Inline.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-01T22:02:07Z


Great, I made a few trivial additional style edits not worth complaining about. Pushed to git towards 3.913.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-14T20:22:02Z


In 3.914.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance Issue involves performance issues resolution: fixed Closed; fixed
Projects
None yet
Development

No branches or pull requests

2 participants