New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-cutable ordering loop when using an array of clocks #1009
Comments
Original Redmine Comment Forgot to post the error message:
|
Original Redmine Comment I've (somewhat) simplified the example and removed the /verilator public/ clocks which don't appear to actually be part of the problem. https://github.com/toddstrader/verilator-dev/tree/clkloop I've also included "`define BROKEN" in the test which you can comment out to see a workaround. Comparing the _begin.tree ASTs from these two versions, I see the only differences are (< is broken, > is working):
I propose the following solution. Wilson, I'd especially like to get your feedback on this plan.
I'm not yet entirely sure what pass to do this on or if it should be a new pass. That's what I'm currently investigating. Please let me know if this seems reasonable or is a terrible idea. |
Original Redmine Comment
This seems the right approach. V3Gate would be the right point for the optimization, as it knows the full interconnectivity. You'll also need to decide how to handle logic that deals with the clocks as a vector - e.g. clocks[vec] = 1. |
Original Redmine Comment Here's my proposed patch: I suggested adding a warning earlier because I thought the case of {my_big_vector, clk} would lead to entirely decomposing my_big_vector. However, that turned out not to be true so I didn't add a warning. Of course, I can go back and add that if it is desirable. Also, the regression test has been sanitized and is now free of Altera code. The patch fixes t_clk_concat and works on our internal codebases. |
Original Redmine Comment I pushed the little AstNode debug change. The code seems basically reasonable, I'll work on detailed feedback. |
Original Redmine Comment I think this code should instead be an optimization inside V3Gate. It will Code wise I understand V3Graph is a bit more daunting to understand, but I also looked through the code for the normal stuff:
I'd suggest instead
Then this stuff becomes clearer
+void V3ClkVector::clkVectorDecompose(AstNetlist* nodep) {
Each visitor through the design has a significant performance overhead on
I speculate this code will break if there's a select in a non-expected
Also do the same in the AstNode generic visitor, so anything odd between
Should be m_lhsp.
Should be "if (!m_lhsp)" similar other places with NULL.
You can't remove an assignment unless you know that all consumers have +class ClkVectorReplaceVisitor : public AstNVisitor {
When replacing basically everything, it's typically better to make a new |
Original Redmine Comment This is presently stuck waiting for a patch update. |
Original Redmine Comment Yeah, that's on me. I haven't looked at this for some time, but I'm currently trying to get my head around V3Gate / V3Graph. To be clear, are you suggesting that instead of having new visitors discovering the connections between clocks and LHS's that we could find the same information in the V3Graph that V3Gate builds? |
Original Redmine Comment OK, yeah. I think if we have V3Gate look for concatenated clockers and stash that information in the ASSIGNW nodes, then we can push that forward in the V3Graph and skip over the concatenation when we find a constant selection. I'll give this a try, but please do let me know if I sound off track. |
Original Redmine Comment
Exactly, thanks. |
Original Redmine Comment First stab here, but I wanted to ask some questions before cleaning it up: In the test there is something like this (module paths excluded): I'm using the strategy discussed above to discover that wr_clk is really just getting clk1. Then I change the AST so that the ASSIGNW for wr_clk just assigns clk1 instead of the SEL that was previously there. Based on this, I'd like input on:
Any other advice is welcomed too. |
Original Redmine Comment Update: I finally got a chance to run this against our internal codebases and found that we're still getting non-cuttable edge loop errors that the very first patch was able to handle. I updated the branch on GitHub with an additional case (now fixed) but that's still not enough. I think the bottom line here is that doing the clock decomposition earlier means that the solution needs to be more generalized than it was before (or it needs to handle a bunch of new cases). I'm still curious about the questions above, but I just wanted to add a note that this is still not fully baked when tested against our code. |
Original Redmine Comment
I'm not sure why not editing the graph works, maybe you just got lucky. You certainly should make it match the new netlist. I think you should call your optimizat after the first getting rid of Thanks for the tests. Please have at least a few vectors start at some |
Original Redmine Comment I found that I could further simplify things (at least it seems simpler to me) by detecting the clock vectors while traversing the graph. I am now also editing both the AST and the graph when I bypass a clock vector. As mentioned in the comments, I am decomposing the clock vectors before removeRedundantEdgesSum() because I am relying on the redundant edges in the graph (e.g. one signal is included in a concatenation multiple times). I also added a crazy MSB/LSB test case as you suggested and had to add a bunch more as I cycled between the Verilator tests and our internal codebases. Most importantly, I think this is basically ready to go now (modulo cleanup). It's passing all of the Verilator tests and is working with our internal code as well. Please let me know what you think: |
Original Redmine Comment Wow, how did this code end up so small? That's great.
In the tests, if you're using Emacs, please indent using Verilog-mode default indent. No big deal otherwise, I'll do it on the commit. Please patch bin/verilator to describe the new behavior for clocker, and how people are to use it to help. Do you think there should be a -O switch to disable this? Probably OK if not given it requires the clocker attribute. With these, I'm ready to merge if you are. |
Original Redmine Comment I believe I've incorporated all of your feedback: I'm not an Emacs user, but I tried to reformat using Emacs. However, I'm guessing I don't have Verilog-mode installed so it's probably wrong. Concerning a -O switch, I didn't do anything there, but I can if you want. However, even at -O0, I think I'd want this behavior since it's a correctness, not a performance issue. I re-ran this against our internal codebases and everything still passes, so as far as I'm concerned this is ready to go. |
Original Redmine Comment Ready to merge but one complication: ../V3Gate.cpp:1408:5: error: control reaches end of non-void function [-Werror=return-type] This is at bottom of visit(GateLogicVertex. Perhaps need a return VNUser(0)? Note to see these, have this in your .bashrc/.cshrc, then remake.
|
Original Redmine Comment Fixed, and passing both Verilator tests and our internal tests: |
Original Redmine Comment Golden. Pushed to git towards 3.903. |
Original Redmine Comment In 3.904. |
Author Name: Todd Strader (@toddstrader)
Original Redmine Issue: 1009 from https://www.veripool.org
Original Assignee: Todd Strader (@toddstrader)
See t_clk_concat in:
https://github.com/toddstrader/verilator-dev/tree/clkloop
This test is for demonstration purposes only and won't be able to go upstream as it is using a number of (readily available but copyrighted) Altera library modules. Unfortunately, bisecting this has proved quite challenging and I have so far been unable to reproduce the issue without the Altera code.
Within the test, the module t1 instantiates a dcfifo_async (Altera module) which has two clocks. The code will toss ordering errors if the clocks are:
If either one of these things is not true, the test will pass.
The text was updated successfully, but these errors were encountered: