Convergence failures, how to debug
I'm using 4.002 with a design that has simulated successfully with other compilers. It is failing with "model didn't converge". The FAQ directs me to use debugging options and look at the CHANGE messages. Naively I was expecting to see repeated CHANGE messages for one or more signals showing they were failing to converge.
In fact what I see is around 30 CHANGE messages, all for different signals, referencing the line of the signal declaration. This leaves me with no clue as to which assignment is occuring repeatedly and causing the value to oscillate. Increasing --converge-limit has no effect either.
Is there some better way to debug this? Or would it be possible to give a more specific message about which assignments were causing the failure?
#1 Updated by Wilson Snyder 11 months ago
- Status changed from New to AskedReporter
Sorry, this isn't usually a problem and when it is it usually is "obvious" from the signal involved what is the problem.
If many signals are getting printed then most likely each are oscillating (or there is a bug), of course signal e.g. "A" may be oscillating, then "A" feeds signal "B" which then is also reported as oscillating.
I would suggest debugging it like a "C" program now; either going into gdb and tracing, or editing the generated C++ code to add appropriate prints to see what is going on.
I'm not sure whether it was a bug report or not. It was basically an observation that the FAQ made it sound like the diagnostics would obviously show the root cause of a failure to converge, but it doesn't seem like they do. It's a report about the diagnostics not being clear. No doubt the root cause of the failure to converge is my error and I can diagnose it with gdb or prints, but the issue is whether you aim for the diagnostics are expected to be more helpful in locating the root cause.
E.g. couldn't it show the source position(s) where the signal is being changed, rather than the source position of its declaration?
#4 Updated by Wilson Snyder 11 months ago
Fair point, added more words to the documentation.
Unfortunately verilator at this stage doesn't have any knowledge of what logic might be causing the issue.
Also it's worth reviewing any UNOPTFLAT messages you suppressed, those do indicate the logic involved.
I had hundreds of UNOPTFLAT but they were all paths from one part of a vector to another rather than genuine combinational loops. The problem turned out to be
always #5 clk = ~clk;
It got a STMTDLY warning, but clearly if Verilator is ignoring delays then this will fail to converge.
Also available in: Atom