Project

General

Profile

[logo] 
 
Home
News
Activity
About/Contact
Major Tools
  Dinotrace
  Verilator
  Verilog-mode
  Verilog-Perl
Other Tools
  BugVise
  CovVise
  Force-Gate-Sim
  Gspice
  IPC::Locker
  Rsvn
  SVN::S4
  Voneline
  WFH
General Info
  Papers

Issue #1096

UNOPT and UNOPTFLAT V3Split optimizations

Added by Arthur Kahlich over 2 years ago. Updated over 1 year ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%


Description

I am running into a performance issue due to Verilator finding circular logic where none exists. This causes the entire model to be evaluated twice for each input edge instead of once, which is a significant performance degradation.

To see the issue, de-tar the 2 small Verilog source files and run the command:
verilator -cc +libext+.v -y . abc.v
What I see is the following:
%Warning-UNOPTFLAT: abc.v:15: Signal unoptimizable: Feedback to clock or circular logic: abc.a
%Warning-UNOPTFLAT: Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
%Warning-UNOPTFLAT:      Example path: abc.v:15:  abc.a
%Warning-UNOPTFLAT:      Example path: abc.v:44:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:16:  abc.b
%Warning-UNOPTFLAT:      Example path: xyz.v:11:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:15:  abc.a
%Warning-UNOPT: abc.v:10: Signal unoptimizable: Feedback to public clock or circular logic: e
%Warning-UNOPT:      Example path: abc.v:10:  e
%Warning-UNOPT:      Example path: abc.v:53:  ASSIGNW
%Warning-UNOPT:      Example path: abc.v:21:  abc.h
%Warning-UNOPT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPT:      Example path: abc.v:10:  e
%Warning-UNOPTFLAT: abc.v:17: Signal unoptimizable: Feedback to clock or circular logic: abc.c
%Warning-UNOPTFLAT:      Example path: abc.v:17:  abc.c
%Warning-UNOPTFLAT:      Example path: abc.v:42:  ASSIGNW
%Warning-UNOPTFLAT:      Example path: abc.v:18:  abc.d
%Warning-UNOPTFLAT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:17:  abc.c
%Warning-UNOPTFLAT: abc.v:22: Signal unoptimizable: Feedback to clock or circular logic: abc.i
%Warning-UNOPTFLAT:      Example path: abc.v:22:  abc.i
%Warning-UNOPTFLAT:      Example path: abc.v:159:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:26:  abc.m
%Warning-UNOPTFLAT:      Example path: abc.v:143:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:28:  abc.p
%Warning-UNOPTFLAT:      Example path: abc.v:55:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:25:  abc.l
%Warning-UNOPTFLAT:      Example path: xyz.v:11:  ALWAYS
%Warning-UNOPTFLAT:      Example path: abc.v:22:  abc.i
%Error: Exiting due to 4 warning(s)

In all of these Warnings no true circular logic exists. This appears to be a case of each LHS of a blocking assign in an 'ALWAYS @*' block being given all of the sensitivity list of the entire block, rather than only the wires/regs that the particular LHS reg truly depends on. I note in the source code that there appears to be a visit pass intended to split ALWAYS blocks - perhaps there is a bug here or perhaps the pass is being applied at an unfortunate point in the overall sequence of passes.

I also have observed that manually splitting the combo ALWAYS blocks into a separate block for each LHS cures the problem. Unfortunately this is not practical in the code base I have, due to its massive size and due to the terrible readability problem it would cause.

The test case files were extracted (manually) from a much larger Verilog code base and the signal names sanitized. The particular Verilog in the test case may not do anything meaningful other than illustrate where Verilator finds circular logic where there is none.

false_unoptflat.tgz - Small UNOPTFLAT test case (1 KB) Arthur Kahlich, 10/20/2016 11:48 PM

History

#1 Updated by Wilson Snyder over 2 years ago

  • Status changed from New to Duplicate

Verilator does block-by-block scheduling; to avoid these warnings and improve performance you must manually split up at least one of the always statements that it is complaining about. For more details see the UNOPT message in the manual and bug63 and the forums. Sorry.

#2 Updated by Arthur Kahlich over 2 years ago

Oh dear. As I wrote before, due to the size of the code and for readability considerations, the splitting is just not practical. For the blocks that define state machine logic it would bloat the code by around 2X or more and make the code unreadable. This Verilog coding style is particularly popular as a means of specifying default output values for signals that have only a few non-default values.

I note in the Verilator source code comments on the oSplit visit path that would lead me to believe its purpose is to automatically do the combo ALWAYS block splitting that you say I need to do manually. Is this not the case? Or is there some limitation to this process that I could investigate reducing or eliminating?

#3 Updated by Arthur Kahlich over 2 years ago

I looked at bug63 and that appears to have both aspects of the UNOPTFLAT warning covered: the block-by-block scheduling aspect as illustrated by my test case here, and the wire array aspect, which I would also like to look at helping with, but which I consider a much harder problem to fix properly.

#4 Updated by Wilson Snyder over 2 years ago

As to V3Split, it tries to break up always blocks subject to what is considered legal from a Verilog language standpoint. If you for example have a $display statement, it won't break up the other statements proceeding that display in the same block. It certainly could be improved or flags added to make it more optionally aggressive.

#5 Updated by Arthur Kahlich over 2 years ago

That's what I was hoping for. While I follow the visit pass methodology in general, I will need to look the the particular classes used by this pass to see if what would be needed is there, and if not add it before proceeding.

There will be a need to break apart case blocks which could get complicated, and could also end up in some ways being sub-optimal compared to keeping the case block together. I think the potential gain from not having to eval twice will offset conversion of the case to a series of individual if blocks. Of course, a case block can only be broken up if it is complete.

All of the above assumes a pass somewhere to only enable this if there are indeed no latches implied in the block.

Is V3Split the right place to look at the breakup of complete case blocks?

#6 Updated by Wilson Snyder over 2 years ago

Yes, look at V3Split, also look at V3Table which converts many case statements to lookup tables for much better performance.

#7 Updated by Arthur Kahlich over 1 year ago

I have looked at V3Split and come to the conclusion that I need to add another pass, call it V3SplitCombo. This is because V3Split only deals with delayed (non-blocking) assignments. The goal of V3Split appears to be the reduction of slave temporaries for implied flipflops, which is a worthy goal, but not related to what I am trying to accomplish. I will place the call to V3SplitCombo right after the call to V3SplitAs::splitAsAll. After looking at some of the other passes, it appears that 2 important simplifications will have been done:

1. Assignments to concatenations will have been split into a series of assignments to the variables on the LHS. I saw this happening in V3Const::replaceNodeAssign(). I am depending on the tree after this to be free of CONCAT and CONCATN on the LHS. If this is not the case, please let me know.

2. All Verilog case statements will have been converted to if-else constructs. I saw this happening in V3Case::caseAll(). I am depending on the tree after this to be free of CASE and CASEITEM types and their associated Ast types. If this is not the case, please let me know.

A further simplification to the SplitCombo methods would be if all assignments within the combo block were converted to SSA form with the use of appropriate temporary variable insertions. I am just starting to look through the passes to see if this is already being done, which would save me the trouble.

#8 Updated by Wilson Snyder over 1 year ago

V3Split also does combo logic, it works with both assignment types, for example see debug output of t_alw_split.pl. If this needs extension probably V3Split is better to enhance, versus adding another pass.

Your assumption about LHS concats and case statements is correct.

I've debated converting to SSA form before, but never made the plunge, as my primary motivation was to enable some downstream optimizations and experiments showed that GCC was doing almost as well so didn't seem worth it. Simplifying the code is a different perhaps better argument.

#9 Updated by Arthur Kahlich over 1 year ago

This morning after I wrote that and started going over V3Split more closely I did notice that at least the comments indicate splitting blocking (non-delayed) assigns.

I am still trying to figure out precisely where the non-delayed assigns get marked to be split, since they don't seem to have a visit function (is it visit(AstVarRef* nodep)? ) - unless it is inherited, which would make things harder to follow.

It appears that the biggest thing I want split that is currently not are all assigns to the same variable, grouped together, with intermediate dependent variables and if blocks included. Then if there are other multiply assigned variables, each one would get its own block with all its dependent variables and if blocks included. This can be sub-optimal in some situations, but it is the only way I can think of to break the false combo dependency loops and still have any scheduling sort work.

OTOH, I could convert all multiple LHS references to unique SSA style names, along with the RHS and if condition references. Then split it all up, let it be reordered according to dependencies, then use a clean up pass to replace each group of SSA style names back to the original name, since the statements would still be in the correct order. This might still get into trouble with the reordering attempt unless if-else blocks containing multiple assignments get split, which again, could be sub-optimal in certain situations, but could guarantee that the sub-optimal path only gets executed once per eval instead of at least twice.

#10 Updated by Wilson Snyder over 1 year ago

  • Status changed from Duplicate to Confirmed

V3Split will separate any statement list, so a normal assign is handled by being in the list scanBlock iterates.

If you start spliting under a conditional you will want to put them back somewhere or add some new optimization, otherwise all of the extra compares is likely to result in slower code. Also be careful PLI statements must stop splitting like as currently done.

#11 Updated by Wilson Snyder over 1 year ago

  • Subject changed from UNOPT and UNOPTFLAT issued where there is no circular logic to UNOPT and UNOPTFLAT V3Split optimizations

#12 Updated by Arthur Kahlich over 1 year ago

This morning after I wrote that and started going over V3Split more closely I did notice that at least the comments indicate splitting blocking (non-delayed) assigns.

I am still trying to figure out precisely where the non-delayed assigns get marked to be split, since they don't seem to have a visit function (is it visit(AstVarRef* nodep)? ) - unless it is inherited, which would make things harder to follow.

It appears that the biggest thing I want split that is currently not are all assigns to the same variable, grouped together, with intermediate dependent variables and if blocks included. Then if there are other multiply assigned variables, each one would get its own block with all its dependent variables and if blocks included. This can be sub-optimal in some situations, but it is the only way I can think of to break the false combo dependency loops and still have any scheduling sort work.

OTOH, I could convert all multiple LHS references to unique SSA style names, along with the RHS and if condition references. Then split it all up, let it be reordered according to dependencies, then use a clean up pass to replace each group of SSA style names back to the original name, since the statements would still be in the correct order. This might still get into trouble with the reordering attempt becoming unsolvable unless if-else blocks containing multiple assignments get split, which again, could be sub-optimal in certain situations, but could guarantee that the sub-optimal path only gets executed once per eval instead of at least twice.

Also available in: Atom