V3Inline can create unboundedly large modules #1223

veripoolbot · 2017-09-29T16:43:15Z

Author Name: John Coiner (@jcoiner)
Original Redmine Issue: 1223 from https://www.veripool.org

Original Assignee: John Coiner (@jcoiner)

Scenario:

Module TOP instances two of module B.
Module B instances two of module C.
(and so on until...)
Module I instances two of module J.
Module J instances two of module K, which is finally a leaf node.

There are 1024 instances of K in this design, grand total.

V3Inline may flatten this into one single giant module. For all mods B through K, we have only two "refs" in the accounting done by InlineMarkVisitor::visit(AstNodeModule* nodep), and "m_stmtCnt" doesn't take into account the growth in statements resulting from inlining. So we generate a single module with at least 1024*K statements. Generated .cpp files can run into the megabytes for a small but steeply-nested design like this.

I understand that inlining is great for lots of reasons: it lets us elide assignments at module boundaries, it minimizes function call overhead due to functions becoming larger, and creates more opportunities for the C compiler to reorder operations (promoting LOADs, etc) also due to functions becoming larger. Probably other reasons.

My guess is that most of the benefits of inlining happen when smaller modules are inlined. Once we are inlining larger modules, my guess is that beyond a certain point, the costs begin to exceed the benefits. The instruction stream grows sharply from inlining very large modules, and in the extreme case above, we lose the ability to V3Combine anything. So I would predict a negative ICache impact. Also verilation time and compile time get bad when functions grow very large.

Here's a proposal:

When deciding which modules to inline, visit modules in bottom-up order (in the instance hierarchy)
Let's assume we've decided to inline module K. Now when we make the inlining decision for module J, compute the number of statements in J taking into account both J's native statements and those it inherited from its two inlined copies of K.

By the time we get a few levels up from K, the size of the module we're considering inlining will grow above v3Global.opt.inlineMult() and we'll stop inlining at some medium size.

Wilson, does this make sense? If you agree I can work up a patch.

I ran into this while working on multicore support (issue 1216). I was trying to derive the partitioning of the execution graph in part from the scope partitioning. It wasn't working well due to V3Inline flattening everything into only one scope.

veripoolbot · 2017-09-30T14:25:59Z

Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-09-30T14:25:59Z

When deciding which modules to inline, visit modules in bottom-up order (in the instance hierarchy)

Let's assume we've decided to inline module K. Now when we make the inlining decision for module J, compute the number of statements in J taking into account both J's native statements and those it inherited from its two inlined copies of K.

That was how it was intended to work. It certainly does already iterate child-up. There certainly could be an error in computing the weight of inlining, I welcome a patch.

veripoolbot · 2017-09-30T16:39:32Z

Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-09-30T16:39:32Z

Attached is a patch for V3Inline.cpp. It splits the old InlineMarkVisitor into two passes:

InlineStatsVisitor goes top down, counts the number of refs to each module, the number of statements native to each module, and its CIL_ disposition.
InlineMarkVisitor goes bottom-up, making a final inline decision for each module, and updating the statement counts to reflect inlined modules (fixing the bug)

There are also a few fixes to get the regression stable under Ubuntu 17.10 which has a new GCC and a new Perl:

GCC doesn't like enums being used in bool context, so don't
GCC doesn't like operator++ being used on a bool, so don't
Perl has gotten more strict about what regexes it will accept, so I had to guard one regex within an eval() to avoid crashing driver.pl.
Please ignore the change in t_func_const_struct_bad.pl. I made a change here, committed it, realized I didn't need this change, and committed the reverse change. So there's no net change to the file. For some reason 'git format-patch' still wants to talk about this file in the patch. Whereas 'git diff' behaves like I expect over the same rev range: it understands that there the edits cancel out and it's silent about this file. Sorry. It's literally my first day using git and this part is over my head.

veripoolbot · 2017-09-30T17:39:31Z

Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-09-30T17:39:31Z

I pushed the compiler fixes, thanks for that. BTW its easier to track if you file a separate bug as that's a separate problem.

As to V3Inline I'm ok with your algorithm. However each additional Visitor adds time to the program, it wasn't obvious to me why a new stage was needed versus it all being calculated in the existing InlineMarks - that is calculate all of the statistics related to each module, then have a final loop on each module (see e.g. for loop in V3Dead.cpp line 281) which does the inline or not calculation. (Each additional visitor can add a minute to a huge design.)

veripoolbot · 2017-10-01T16:21:02Z

Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-10-01T16:21:02Z

Wilson, here's a new V3Inline patch that does not add a new pass as you suggest. It's shorter too.

Also here's a bonus patch that makes t_dist_whitespace give a line number for the failure; I needed this to debug.

veripoolbot · 2017-10-01T20:26:00Z

Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-01T20:26:00Z

Pushed the line fix, thanks.

Inlining is nicely done. Just a few nits, if you can please clean these then I'll merge.

+    vector<AstNodeModule*> m_all_mods;  // All modules, in top-down order.

  typedef vector<AstNodeModule*>  ModVec;
  ModVec m_allMods;


-           } else {
-               m_modp->user1(1);
+           } else if (m_modp->user2() == CIL_MAYBE ||
+                       m_modp->user2() == CIL_NOTSOFT) {
+               m_modp->user2(CIL_USER);

  } else if (m_modp->user2() == CIL_MAYBE
              || m_modp->user2() == CIL_NOTSOFT) {

(I always put operators first on a continuation line as it's easier to read quickly, especially on long lines.)


+        // Iterate through all modules in bottom-up order.
+        // Make a final inlining decision for each.
+        for (int idx = m_all_mods.size() - 1; idx >= 0; idx--) {
+            AstNodeModule* mod = m_all_mods[idx];

  for (ModVec::reverse_iterator it=m_allMods.rbegin(); it!=m_allMods.rend(); ++it)
        AstNodeModule* modp = *it;


+            LocalInstanceMap& locals = m_instances[mod];

    LocalInstanceMap& localsr = m_instances[mod];

(Use trailing r for references.)

+            for (LocalInstanceMap::iterator it = locals.begin();
+                 it != locals.end(); ++it) {
+                AstNodeModule* child = it->first;

    AstNodeModule* childp = it->first;


+                if (child->user1() /* inlining child */) {

    Use // comments please.

veripoolbot · 2017-10-01T21:45:38Z

Original Redmine Comment
Author Name: John Coiner (@jcoiner)
Original Date: 2017-10-01T21:45:38Z

Thanks. This patch applies atop the previous one for V3Inline.

veripoolbot · 2017-10-01T22:02:07Z

Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-01T22:02:07Z

Great, I made a few trivial additional style edits not worth complaining about. Pushed to git towards 3.913.

veripoolbot · 2017-10-14T20:22:02Z

Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-10-14T20:22:02Z

In 3.914.

veripoolbot closed this as completed Oct 14, 2017

veripoolbot added area: performance Issue involves performance issues resolution: fixed Closed; fixed labels Dec 22, 2019

veripoolbot assigned jcoiner Dec 22, 2019

wsnyder mentioned this issue May 3, 2020

Verilator 3.914 2017-10-14 released verilator/verilator-announce#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V3Inline can create unboundedly large modules #1223

V3Inline can create unboundedly large modules #1223

veripoolbot commented Sep 29, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 14, 2017

V3Inline can create unboundedly large modules #1223

V3Inline can create unboundedly large modules #1223

Comments

veripoolbot commented Sep 29, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Sep 30, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 1, 2017

veripoolbot commented Oct 14, 2017