Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-cutable ordering loop when using an array of clocks #1009

Closed
veripoolbot opened this issue Dec 3, 2015 · 20 comments
Closed

Non-cutable ordering loop when using an array of clocks #1009

veripoolbot opened this issue Dec 3, 2015 · 20 comments
Assignees
Labels
resolution: fixed Closed; fixed type: feature-IEEE Request to add new feature, described in IEEE 1800

Comments

@veripoolbot
Copy link
Contributor


Author Name: Todd Strader (@toddstrader)
Original Redmine Issue: 1009 from https://www.veripool.org

Original Assignee: Todd Strader (@toddstrader)


See t_clk_concat in:

https://github.com/toddstrader/verilator-dev/tree/clkloop

This test is for demonstration purposes only and won't be able to go upstream as it is using a number of (readily available but copyrighted) Altera library modules. Unfortunately, bisecting this has proved quite challenging and I have so far been unable to reproduce the issue without the Altera code.

Within the test, the module t1 instantiates a dcfifo_async (Altera module) which has two clocks. The code will toss ordering errors if the clocks are:

  1. declared as Verilator clocks using /verilator clocker/
  2. passed into t1 as an array of two clocks

If either one of these things is not true, the test will pass.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2015-12-03T12:19:45Z


Forgot to post the error message:

%Error: Circular logic when ordering code (non-cutable edge loop)
%Error:      Example path: t/t_clk_concat.v:234:  v.t1.__Vcellinp__dcfifo_async__rdclk
%Error:      Example path: t/t_clk_concat.v:551:  ACTIVE
%Error:      Example path: t/t_clk_concat.v:551:  ALWAYS
%Error:      Example path: t/t_clk_concat.v:234:  v.t1.__Vcellinp__dcfifo_async__rdclk
%Error:      Example path: t/t_clk_concat.v:717:  ASSIGNW
%Error:      Example path: t/t_clk_concat.v:234:  v.t1.__Vcellinp__dcfifo_async__rdclk
%Error:      Example path: t/t_clk_concat.v:235:  v.t1.__Vcellinp__dcfifo_async__wrclk
%Error:      Example path: t/t_clk_concat.v:500:  ACTIVE
%Error:      Example path: t/t_clk_concat.v:500:  ALWAYS
%Error:      Example path: t/t_clk_concat.v:235:  v.t1.__Vcellinp__dcfifo_async__wrclk
%Error:      Example path: t/t_clk_concat.v:718:  ASSIGNW
%Error:      Example path: t/t_clk_concat.v:235:  v.t1.__Vcellinp__dcfifo_async__wrclk
%Error:      Example path: t/t_clk_concat.v:718:  ASSIGNW
%Error:      Example path: t/t_clk_concat.v:235:  v.t1.__Vcellinp__dcfifo_async__wrclk
%Error:      Example path: t/t_clk_concat.v:500:  ACTIVE
%Error:      Example path: t/t_clk_concat.v:500:  ALWAYS
%Error:      Example path: t/t_clk_concat.v:235:  v.t1.__Vcellinp__dcfifo_async__wrclk
%Error:      Example path: t/t_clk_concat.v:718:  ASSIGNW
%Error: Internal Error: ../V3Graph.h:138: Loops detected in graph: 0x1dc8b50

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2015-12-09T22:39:27Z


I've (somewhat) simplified the example and removed the /verilator public/ clocks which don't appear to actually be part of the problem.

https://github.com/toddstrader/verilator-dev/tree/clkloop

I've also included "`define BROKEN" in the test which you can comment out to see a workaround. Comparing the _begin.tree ASTs from these two versions, I see the only differences are (< is broken, > is working):

71c71
<     1:2:1:1: VARREF  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__rdclk [RV] <- VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__rdclk MODULETEMP
---
>     1:2:1:1: VARREF  <> {} @dt=@(G/w1)  i_clk0 [RV] <- VAR  <> {} @dt=@(G/w1)  i_clk0 [I] INPUT
73,87c73
<     1:2:1:1: VARREF  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__wrclk [RV] <- VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__wrclk MODULETEMP
<     1:2: ASSIGNW  <#> {} @dt=@(G/w1)
<     1:2:1: SEL  <> {} @dt=@(G/w1) decl[1:0]]
<     1:2:1:1: VARREF  <> {} @dt=@(G/w2)  i_clks [RV] <- VAR  <> {} @dt=@(G/w2)  i_clks [I] INPUT
<     1:2:1:2: CONST  <> {} @dt=@(G/sw1)  1'h1
<     1:2:1:3: CONST  <> {} @dt=@(G/w32)  32'h1
<     1:2:2: VARREF  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__wrclk [LV] => VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__wrclk MODULETEMP
<     1:2: VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__wrclk MODULETEMP
<     1:2: ASSIGNW  <> {} @dt=@(G/w1)
<     1:2:1: SEL  <> {} @dt=@(G/w1) decl[1:0]]
<     1:2:1:1: VARREF  <> {} @dt=@(G/w2)  i_clks [RV] <- VAR  <> {} @dt=@(G/w2)  i_clks [I] INPUT
<     1:2:1:2: CONST  <> {} @dt=@(G/sw1)  1'h0
<     1:2:1:3: CONST  <> {} @dt=@(G/w32)  32'h1
<     1:2:2: VARREF  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__rdclk [LV] => VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__rdclk MODULETEMP
<     1:2: VAR  <> {} @dt=@(G/w1)  __Vcellinp__dcfifo_async__rdclk MODULETEMP
---
>     1:2:1:1: VARREF  <> {} @dt=@(G/w1)  i_clk1 [RV] <- VAR  <> {} @dt=@(G/w1)  i_clk1 [I] INPUT

I propose the following solution. Wilson, I'd especially like to get your feedback on this plan.

  1. Anytime a clock is assigned to an array, decompose that array into individual bits
  2. This will affect variables, pins, and possibly other nodes
  3. Toss a CLKARRAY warning to let the user know this is happening (especially bad in the case of {my_big_vector, clk})

I'm not yet entirely sure what pass to do this on or if it should be a new pass. That's what I'm currently investigating. Please let me know if this seems reasonable or is a terrible idea.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2015-12-10T00:52:38Z


Anytime a clock is assigned to an array, decompose that array into individual bits

This seems the right approach. V3Gate would be the right point for the optimization, as it knows the full interconnectivity. You'll also need to decide how to handle logic that deals with the clocks as a vector - e.g. clocks[vec] = 1.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2016-05-03T18:14:34Z


Here's my proposed patch:
https://github.com/toddstrader/verilator-dev/tree/clockloop

I suggested adding a warning earlier because I thought the case of {my_big_vector, clk} would lead to entirely decomposing my_big_vector. However, that turned out not to be true so I didn't add a warning. Of course, I can go back and add that if it is desirable.

Also, the regression test has been sanitized and is now free of Altera code.

The patch fixes t_clk_concat and works on our internal codebases.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2016-05-06T03:22:38Z


I pushed the little AstNode debug change. The code seems basically reasonable, I'll work on detailed feedback.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2016-05-07T11:48:48Z


I think this code should instead be an optimization inside V3Gate. It will
be a lot more powerful speeding up other usages, and the gate optimizations
feed upon each other - for example an array of clocks after decomposition
might then allow those nets to propagate away entirely.

Code wise I understand V3Graph is a bit more daunting to understand, but
once grasped you can do all of this probably in a single routine (and
children) as via the graph you can "see" everything related to your signal
and change it at once.

I also looked through the code for the normal stuff:

  • ClkVectorMap* clkVectorMapp() { return &m_clkVectorMap; }
  • ClkMap* clkMapp() { return &m_clkMap; }

I'd suggest instead

  ClkVectorMap& clkVectorMap() { return m_clkVectorMap; }
  ClkMap& clkMap() { return m_clkMap; }

Then this stuff becomes clearer

  • (*m_statep->clkMapp())[lhsVarScopep] = sourceIt->second;
  • m_statep->clkMap()[lhsVarScopep] = sourceIt->second;

+void V3ClkVector::clkVectorDecompose(AstNetlist* nodep) {

  • ClkVectorFindVisitor findVisitor (nodep, &state);
  • ClkVectorMapVisitor mapVisitor (nodep, &state);
  • ClkVectorReplaceVisitor replaceVisitor (nodep, &state);

Each visitor through the design has a significant performance overhead on
large designs. Having three additional passes adds about 3% slowdown;
perhaps a bit more bookkeeping would allow going to two.

  • virtual void visit(AstSel* nodep, AstNUser*) {
  •  UINFO(5,nodep<<endl);
    
  •  if (m_inLhs && nodep->lsbp()->castConst()) {
    
  •      m_lhsOffset = nodep->lsbConst();
    
  •      nodep->fromp()->iterate(*this);
    

I speculate this code will break if there's a select in a non-expected
place, or select under select. I'm not sure that's possible on a
left-hand-side, but better to add some insurance:

  •  if (m_inLhs && nodep->lsbp()->castConst()) {
    
  •      m_lhsOffset = nodep->lsbConst();
          m_inLhs = false;
    
  •      nodep->fromp()->iterate(*this);
          m_inLhs = true;
    

Also do the same in the AstNode generic visitor, so anything odd between
the AssignW and Sel/Concat will result in not optimizing rather than
mis-optimizing.

  •  if (varrefp->varp()->width() == 1) {
    
  •      m_lhs = varrefp;
    

Should be m_lhsp.

  • virtual void visit(AstSel* nodep, AstNUser*) {
  •  UINFO(5,nodep<<endl);
    
  •  if (m_lhs == NULL)
    

Should be "if (!m_lhsp)" similar other places with NULL.

  • virtual void visit(AstAssignW* nodep, AstNUser*) {
    +...
  •      if (m_removeAssign) {
    
  •          UINFO(9,"Removing assignment: "<<nodep<<endl);
    
  •          nodep->unlinkFrBack()->deleteTree(); VL_DANGLING(nodep);
    
  •      }
    

You can't remove an assignment unless you know that all consumers have
been eliminated. For example if the vector you are removing is used as a
complete vector in a display statement. If this visitor was before or in
V3Gate you wouldn't need to delete it at all, as it would be automatically
removed if it could be by the gate eater.

+class ClkVectorReplaceVisitor : public AstNVisitor {

  • virtual void visit(AstVarRef* nodep, AstNUser*) {
  • ...
  •      nodep->varScopep(replacementClk->second);
    
  •      nodep->varp(replacementClk->second->varp());
    
  •      nodep->name(replacementClk->second->varp()->name());
    

When replacing basically everything, it's typically better to make a new
AstVarRef as then it's obvious in debug dumps that it is a different
object.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-03-21T23:28:50Z


This is presently stuck waiting for a patch update.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-03-23T18:19:43Z


Yeah, that's on me. I haven't looked at this for some time, but I'm currently trying to get my head around V3Gate / V3Graph. To be clear, are you suggesting that instead of having new visitors discovering the connections between clocks and LHS's that we could find the same information in the V3Graph that V3Gate builds?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-03-23T20:36:26Z


OK, yeah. I think if we have V3Gate look for concatenated clockers and stash that information in the ASSIGNW nodes, then we can push that forward in the V3Graph and skip over the concatenation when we find a constant selection.

I'll give this a try, but please do let me know if I sound off track.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-03-23T22:17:08Z


if we have V3Gate look for concatenated clockers and stash that information in the ASSIGNW nodes, then we can push that forward in the V3Graph and skip over the concatenation when we find a constant selection.

Exactly, thanks.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-03-24T21:04:18Z


First stab here, but I wanted to ask some questions before cleaning it up:
https://github.com/toddstrader/verilator-dev/tree/clockloop_new

In the test there is something like this (module paths excluded):
clk1 -> CONCAT -> the_clks -> SEL -> wr_clk

I'm using the strategy discussed above to discover that wr_clk is really just getting clk1. Then I change the AST so that the ASSIGNW for wr_clk just assigns clk1 instead of the SEL that was previously there. Based on this, I'd like input on:

  1. Do I need to modify the V3Graph? I assumed I'd need to change it to reflect the change I made in the AST, but I tried it without touching the graph and all the tests pass. I feel like I'm getting away with something here.

  2. decomposeClkVectors() is a rather gnarly method right now. Should I make it a graph visitor? One thing I'm worried about with doing that is I'm pretty sure I should only take a limited number of steps through the graph, so it is expressible as a finite series of vertex/edge traversals.

Any other advice is welcomed too.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-03-27T21:21:18Z


Update:

I finally got a chance to run this against our internal codebases and found that we're still getting non-cuttable edge loop errors that the very first patch was able to handle. I updated the branch on GitHub with an additional case (now fixed) but that's still not enough. I think the bottom line here is that doing the clock decomposition earlier means that the solution needs to be more generalized than it was before (or it needs to handle a bunch of new cases).

I'm still curious about the questions above, but I just wanted to add a note that this is still not fully baked when tested against our code.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-03-29T00:33:33Z


src/V3Gate.cpp

  +typedef map<int, AstVarScope*> ClkSourceMap;
  +typedef deque<ClkSourceMap*> ClkSourceMapList;

  int should be uint32_t (to match "offset" type below)

  These can probably be inside the GateVisitor class.

        virtual void visit(AstAssignW* nodep) {
  +     nodep->user3p(NULL);
  +     m_assignp = nodep;
  +     // TODO - not even sure which side (LHS/RHS) goes first -- need to account for assigning single bit to slice of vector

  The RHS goes first in iteration, as it "executes" before an assignment.


  @@ -486,7 +524,32 @@ private:
         if (nodep->backp()->castNodeAssign() && nodep->backp()->castNodeAssign()->lhsp()==nodep) {
             nodep->v3fatalSrc("Concat on LHS of assignment; V3Const should have deleted it\n");
         }
  +    virtual void visit(AstSel* nodep) {
  +     if (m_assignp && nodep->lsbp()->castConst() != NULL) {

  Just !nodep->lsbp()->castConst(), don't compare against NULL.

  +         if (ClkSourceMap* clkMap = (ClkSourceMap*)(m_assignp->user3p())) {
  +             // TODO - maybe there's a way to do this without the copy, but we're currently keeping a list of these things for memory managment p$
  +             *clkMap = newMap;

  Nothing obviously cleaner comes to mind.

  +         }
  +     }
         nodep->iterateChildren(*this);
  +     if (nodep->isOutputter() && m_logicVertexp) m_logicVertexp->setConsumed("outputter");

  Add a comment above this, as otherwise I'm not sure why it's needed.


  +void GateVisitor::decomposeClkVectors() {

  This did seem easier than I expected.

I'm not sure why not editing the graph works, maybe you just got lucky. You certainly should make it match the new netlist.

I think you should call your optimizat after the first getting rid of
buffers, it's more likely to see the pattern it wants then. Ideally it
would be called inside optimizeSignals itself.

Thanks for the tests. Please have at least a few vectors start at some
crazy bit numbers, e.g. [-12:-10] to make sure your bit vector selections are
working right.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-04-12T16:35:06Z


I found that I could further simplify things (at least it seems simpler to me) by detecting the clock vectors while traversing the graph. I am now also editing both the AST and the graph when I bypass a clock vector. As mentioned in the comments, I am decomposing the clock vectors before removeRedundantEdgesSum() because I am relying on the redundant edges in the graph (e.g. one signal is included in a concatenation multiple times). I also added a crazy MSB/LSB test case as you suggested and had to add a bunch more as I cycled between the Verilator tests and our internal codebases.

Most importantly, I think this is basically ready to go now (modulo cleanup). It's passing all of the Verilator tests and is working with our internal code as well.

Please let me know what you think:
https://github.com/toddstrader/verilator-dev/tree/clockloop_new

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-04-28T10:56:14Z


Wow, how did this code end up so small? That's great.

--- a/src/V3Gate.cpp
+++ b/src/V3Gate.cpp
+    bool concatOffset(AstConcat* concatp, AstVarScope* vscp, int& offset) {

Use "offsetr", "r" suffix is hint it's a reference.

+    GateConcatVisitor() {
+    }

Initialize all member variables to NULL/0/etc, just for saftey.

+class GateClkDecompState {
+    GateClkDecompState(int offset, AstVarScope* vsp) {

Missing ~GateClkDecompState() {}

+       if (vsp->user2()) return VNUser(0);
+       vsp->user2(true);

if (vsp->user2SetOnce()) return;

+    virtual VNUser visit(GateLogicVertex* lvertexp, VNUser vu) {
+       GateClkDecompState* currState = (GateClkDecompState*) vu.c();
+       int clk_offset = currState->m_offset;

+       AstAssignW* assignp = lvertexp->nodep()->castAssignW();
+       if (assignp) {

if (AstAssignW* assignp = lvertexp->nodep()->castAssignW()) {


+                   // TODO - Wilson, what is an edge's weight (header says: Weight of the connection) and how should I set it here?
+                   new V3GraphEdge(m_graphp, m_clk_vvertexp, lvertexp, 1);

Weight isn't used for the Gate optimization, only for other stuff such as scheduling where some edges are more important. So use 1.

+    GateClkDecompGraphVisitor(V3Graph* graphp) {
+       m_graphp = graphp;

Initialize all member variables to NULL/0/etc, just for saftey.

Missing

      virtual ~GateClkDecompGraphVisitor() {}

Also suggest making a statistic

         V3Stats::addStatSum("Optimizations, Clocker seen vectors", m_seen_clk_vectors);

Likewise count decomposed vectors.

+void GateVisitor::decomposeClkVectors() {
+    UINFO(9,"Starting clock decomposition"<<endl);
+    AstNode::user2ClearTree();
+    GateClkDecompGraphVisitor decomposer(&m_graph);
+    for (V3GraphVertex* itp = m_graph.verticesBeginp(); itp; itp=itp->verticesNextp()) {
+       GateVarVertex* vertp = dynamic_cast<GateVarVertex*>(itp);
+       if (!vertp) {
+           continue;
+       }
+       AstVarScope* vsp = vertp->varScp();
+       if (vsp->varp()->attrClocker() != AstVarAttrClocker::CLOCKER_YES) {
+           continue;
+       }
+       if (vsp->varp()->width() > 1) {
+           UINFO(9,"Clocker > 1 bit, not decomposing: "<<vsp<<endl);
+           continue;
+       }
+       UINFO(9,"CLK DECOMP - "<<vertp<<" : "<<vsp<<endl);
+       decomposer.clkDecomp(vertp);

Suggest more obvious

         if (GateVarVertex* vertp = dynamic_cast<GateVarVertex*>(itp)) {
             AstVarScope* vsp = vertp->varScp();
             if (vsp->varp()->attrClocker() == AstVarAttrClocker::CLOCKER_YES) {
                 if (vsp->varp()->width() > 1) {
                     UINFO(9,"Clocker > 1 bit, not decomposing: "<<vsp<<endl);
                 } else {
                     UINFO(9,"CLK DECOMP - "<<vertp<<" : "<<vsp<<endl);
                     decomposer.clkDecomp(vertp);

In the tests, if you're using Emacs, please indent using Verilog-mode default indent. No big deal otherwise, I'll do it on the commit.

Please patch bin/verilator to describe the new behavior for clocker, and how people are to use it to help.

Do you think there should be a -O switch to disable this? Probably OK if not given it requires the clocker attribute.

With these, I'm ready to merge if you are.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-05-05T20:34:26Z


I believe I've incorporated all of your feedback:
https://github.com/toddstrader/verilator-dev/tree/clockloop_new

I'm not an Emacs user, but I tried to reformat using Emacs. However, I'm guessing I don't have Verilog-mode installed so it's probably wrong.

Concerning a -O switch, I didn't do anything there, but I can if you want. However, even at -O0, I think I'd want this behavior since it's a correctness, not a performance issue.

I re-ran this against our internal codebases and everything still passes, so as far as I'm concerned this is ready to go.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-05-09T00:19:09Z


Ready to merge but one complication:

../V3Gate.cpp:1408:5: error: control reaches end of non-void function [-Werror=return-type]

This is at bottom of visit(GateLogicVertex. Perhaps need a return VNUser(0)?

Note to see these, have this in your .bashrc/.cshrc, then remake.

export VERILATOR_AUTHOR_SITE=1

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Todd Strader (@toddstrader)
Original Date: 2017-05-09T10:03:33Z


Fixed, and passing both Verilator tests and our internal tests:
https://github.com/toddstrader/verilator-dev/tree/clockloop_new

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-05-09T12:06:07Z


Golden. Pushed to git towards 3.903.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2017-05-31T02:06:24Z


In 3.904.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution: fixed Closed; fixed type: feature-IEEE Request to add new feature, described in IEEE 1800
Projects
None yet
Development

No branches or pull requests

2 participants