Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect simulation in presence of MULTIDRIVEN #634

Closed
veripoolbot opened this issue Mar 18, 2013 · 6 comments
Closed

Incorrect simulation in presence of MULTIDRIVEN #634

veripoolbot opened this issue Mar 18, 2013 · 6 comments
Assignees
Labels
area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed

Comments

@veripoolbot
Copy link
Contributor


Author Name: Ted Campbell
Original Redmine Issue: 634 from https://www.veripool.org
Original Date: 2013-03-18
Original Assignee: Wilson Snyder (@wsnyder)


I'm noticing a complex issue with incorrect simulation result in my design. I've managed to reproduce in a simpler testcase as follows.

Tested against verilator 3.846 and a few older 3.8xx versions.

Test Case Requirements

  • Multiple top-level clocks
  • Dual-port Memory
    ** MULTIDRIVEN with different clocks on each port
    ** Latency 1 read ports
  • Clocks packed into / unpacked from vectors

Attached Verilog

The verilog file contains a simple memory writer module that writes data to the memory, and a simple reader that waits a number of cycles and then verifies that data. There is a 'MULTI_CLK' define in the verilog that demonstrates the mismatch. If it is defined, the design fails, if disabled the design behaves as expected.

Observation 1

When using a single top-level clock with two local wires, the design behaves correctly. The concat / bitsel of the clock alone does not cause issues. If I have two top-level clocks, Verilator reports IMPERFECTSCH on them.

Observation 2

If I use a latency 0 memory read, Verilator still reports IMPERFECTSCH, so I believe the simulation scheduler may have issues.

Observation 3

In my original design, it is actually the memory write that is scrambled, although in this reduced case, the memory read is what is incorrect.

Observation 4

When I examine the generated VFoo.cpp, I see the clocks are mapped to VinpClk variables. These are update in the final block, meaning the clock loads see the correct clock only during the second pass of _eval. The problem is that the chosen scheduler is as follows:

  // Partial and simplified

  if ( posedge clk_rd ) sequent_1( );
  if ( posedge clk_wr ) sequent_2( );

  combo_4( );
  if ( posedge clk_rd ) sequent_6( );
  combo_7( );

  if ( posedge VinpClk_a_clk ) sequent_8( );
  if ( posedge VinpClk_b_clk ) sequent_9( );

  if ( posedge clk_rd ) sequent_11( );
  if ( posedge clk_wr ) multiclk_12( );

  VinpClk_a_clk = ... ;
  VinpClk_b_clk = ... ;

This causes sequent_9 to run after multiclk_12 (during the second iteration of _eval), even though there should be a dependency. This causes address to update before rdata is filled in.

Workaround

If I move the VinpClk_a_clk / VinpClk_b_clk assignments after combo_7, the design works correctly. A similar transform fixes my original large design as well.

In theory if this was added to the schedule graph instead of always at the end, the problem could be solved. Unfortunately it isn't entirely clear to me what could be done if a cycle was detected in that region, which doesn't make this a particularly general solution.

Difficulties

The use of VinpClk suggests the clock is following into similar problems as [[issue613]], where the conclusion seems to be there is sizeable amount of overhaul needed in Verilator.

A simpler approach (at least for my needs) may be to figure out why the clock was confused by the concat / bitsel, and possibly have a simple enhancement for this.

Unfortunately I don't have time immediately to tackle this problem, so I'm hoping to at least start some discussion.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jie Xu (@jiexu)
Original Date: 2013-04-09T15:27:34Z


While working on figure out Issue #613, I have tried out the same workaround which move the @VinpClk@ assigns a bit earlier in the @_eval@ function. This works in most cases.

When investigating into the Verilator source, I found out that:

  • the VinpClk are generated in V3GenClk.cpp when the signal is a clock and is circular.
  • the clock signal was made as circular in V3Order.cpp if it is not from input.

Then in your testcase, Verilator fails to mark @iv_clk@ as input and further fails to mark @a_clk@ and @b_clk@ as input. This only happens when @iv_clk@ depends on multiple variables, both @i_clk_wr@ and @i_clk_rd@ in your multiclk case.

I have made a patch to V3Order.cpp which will solve this by letting Verilator process variables multiple times in @processInputs@.

diff --git a/src/V3Order.cpp b/src/V3Order.cpp
index e204c9f..6ba0d78 100644
--- a/src/V3Order.cpp
+++ b/src/V3Order.cpp
@@ -1067,20 +1067,17 @@ void OrderVisitor::processInputs() {

 void OrderVisitor::processInputsIterate(OrderEitherVertex* vertexp) {
      // Propagate PrimaryIn through simple assignments
-    if (vertexp->user()) return;  // Already processed
      if (0 && debug()>=9) {
         UINFO(9," InIt "<<vertexp<<endl);
         if (OrderLogicVertex* vvertexp = dynamic_cast<OrderLogicVertex*>(vertexp)) {
             vvertexp->nodep()->dumpTree(cout,"-            TT: ");
         }
      }
-    vertexp->user(true);  // Processing
      // First handle all inputs to this vertex, in most cases they'll be already processed earlier
      // Also, determine if this vertex is an input
      int inonly = 1;  // 0=no, 1=maybe, 2=yes until a no
      for (V3GraphEdge* edgep = vertexp->inBeginp(); edgep; edgep=edgep->inNextp()) {
         OrderEitherVertex* frVertexp = (OrderEitherVertex*)edgep->fromp();
-       processInputsIterate(frVertexp);
         if (frVertexp->isFromInput()) {
             if (inonly==1) inonly = 2;
         } else if (dynamic_cast<OrderVarPostVertex*>(frVertexp)) {


I have checked the above patch fixes your problem. But not sure whether it has any side-effect.

Wilson, could you please comment on the patch?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Ted Campbell
Original Date: 2013-04-09T19:31:24Z


Thanks for doing the digging into this. That certainly seems like a good place to work in a fix. I would be concerned about potential graph algorithm convergence issues since you remove the vertexp->user() check. You may end up with a recursive stack overflow.

Instead, I'd tweak you fix as follows. Keep using the user() flag, but instead set it only if our isFromInput() state has changed. This will avoid recursive loops, but each node will still be revisited again after it's final input edge is ready. This does increase runtime slightly from Wilson's original code with O(nodes) to O(edges). The function will be run once per incoming edge.

This of course will fail to detect inputs if there are graph cycles, but I don't believe this is a problem? Only unusual scenario that comes to mind is 'assign v_clk = {v_clk,i_clk};' which is quite bizarre.

diff --git a/src/V3Order.cpp b/src/V3Order.cpp
index e204c9f..7293075 100644
--- a/src/V3Order.cpp
+++ b/src/V3Order.cpp
@@ -1074,13 +1074,11 @@ void OrderVisitor::processInputsIterate(OrderEitherVertex* vertexp) {
             vvertexp->nodep()->dumpTree(cout,"-            TT: ");
         }
      }
-    vertexp->user(true);  // Processing
      // First handle all inputs to this vertex, in most cases they'll be already processed earlier
      // Also, determine if this vertex is an input
      int inonly = 1;  // 0=no, 1=maybe, 2=yes until a no
      for (V3GraphEdge* edgep = vertexp->inBeginp(); edgep; edgep=edgep->inNextp()) {
         OrderEitherVertex* frVertexp = (OrderEitherVertex*)edgep->fromp();
-       processInputsIterate(frVertexp);
         if (frVertexp->isFromInput()) {
             if (inonly==1) inonly = 2;
         } else if (dynamic_cast<OrderVarPostVertex*>(frVertexp)) {
@@ -1096,7 +1094,8 @@ void OrderVisitor::processInputsIterate(OrderEitherVertex* vertexp) {
         vertexp->isFromInput(true);
      }
      // If we're still an input, process all targets of this vertex
-    if (vertexp->isFromInput()) {
+    if (vertexp->isFromInput() && !vertexp->user()) {
+       vertexp->user(true);  // Propegating
         for (V3GraphEdge* edgep = vertexp->outBeginp(); edgep; edgep=edgep->outNextp()) {
             OrderEitherVertex* toVertexp = (OrderEitherVertex*)edgep->top();
             if (OrderVarStdVertex* vvertexp = dynamic_cast<OrderVarStdVertex*>(toVertexp)) {


@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Ted Campbell
Original Date: 2013-04-10T02:08:38Z


Either of these fixes allows my original design to simulate correctly now.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2013-05-02T12:27:03Z


Thanks for the good debugging and patches. I had tried the patch on larger designs and ran out of memory, so while the theory of the fix was right on a more complicated repair was needed to minimize recursion.

Pushed to git towards 3.847.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Ted Campbell
Original Date: 2013-05-02T18:44:26Z


Confirmed master branch passes my full design. Thanks Wilson.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2013-05-11T20:17:07Z


In 3.847.

@veripoolbot veripoolbot added area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed labels Dec 22, 2019
tgorochowik pushed a commit to antmicro/verilator that referenced this issue Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed
Projects
None yet
Development

No branches or pull requests

2 participants