New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for streaming operators #649
Comments
Original Redmine Comment I don't personally have a huge interest in this as it's uncommon syntax, but if you or someone else would like to take it on the patch would of course be welcome. |
Original Redmine Comment I've been wanting stream operator support for a little while to reverse bit sequences (and array sequences). I've gone ahead and added support for these operators myself. The code is currently available in my github repo (https://github.com/grg/verilator) in branch #� Please merge this into the main Verilator repo if you are happy with the changes. Two questions I do have regarding my implementation:
If changes need to be made in response to these questions, should I make the changes or is someone on the Verilator team happy to make them? By the way, a specification of stream operator behavior is found in section 11.4.14 of the IEEE 1800-2009 standard. |
Original Redmine Comment You made tests, too! This makes me so happy :) This is very good for such a large patch. Only minor things pop up:
This should be a AstNodeStream, and as you guessed in V3Ast.h.
This rule is specific to concats, so unless IEEE says that unsized is not
I think the %l and %r here are swapped, if I'm reading your lhs/rhs code right. Also are there more checks needed in the code for unsupported parts of the streaming operators? BTW I'm deep in a couple of other changes, but hope to release later this week, if you want to get it in by then. |
Original Redmine Comment I looked closer at the verilated.h code:
Should be
As otherwise you will get wrong values with 32/64 as an input. Also having a for() loop there pains me as this is very expensive for more than a few bits wide. There are recursive algorithms that would be faster (swap adjacent pairs, then double-pairs, then quad-pairs, etc. Just need to be careful about wrap-around.) Additionally at a minimum, I think we may want to special case the swap-bits (stream-1) case as it's likely the most common; there are much faster algorithms similar to how VL_REDXOR is done, for example: http://www.hackersdelight.org/hdcodetxt/reverse.c.txt (BTW Hacker's Delight is a great book.) Note you need an extra shift right at the end as the result will be at the MSBs, but we may want only say a 5-bit result. |
Original Redmine Comment Hi Wilson, In regards to the unsized test, Section 11.4.14.1 states:
I've made two changes to the test in V3Number:
The changes I made to V3Width were based on examples from the visit functions for AstConcat and AstReplicate (hence the concat code). Is it actually necessary for me to call @nodep->dtypeSetLogicSized(...)@ within the visit function? (What would the AstNodeStream's dtype be prior to this call in V3Width?) I've pushed the remaining changes except for the for loop optimization. I'm thinking a little more about that. |
Original Redmine Comment Try the attached, it'll fail a weird way. You'll need to copy t_EXAMPLE.pl to t_stream2.pl. |
Original Redmine Comment Good news is that I know what the problem is. Of course it's in the one area that I thought was so simple that I didn't need a test :( The problem occurs right now when the right stream operator @{ >> { } }@ appears on the right-hand side of the assignment statement. In this case, the stream operator is a no-op and the thing inside the inner pair of braces should be assigned to the variable on the LHS of the assignment statement. The correct thing for me to do is to remove the AstStreamR operation in this case, which I'll go and do. One thing I'm a little surprised at is that I thought my emitC function in AstStreamR would take care of this. Right now it simply returns the string "%li" -- I had expected this would result in the lhs of the AstStreamR node being spat out in the C code (which would have been on the RHS of a C assignment statement). What would the correct emitC string be if I simply wanted to return a node's child unmodified? |
Original Redmine Comment Alright, I pushed a modified version of my test that tests this case and a fix. (Although I'd still like your advice about what to put in the AstStreamR emitC function.) |
Original Redmine Comment %li should work fine, I use that other places. It's still much better in V3Const to remove the operator, as this then enables further constant propagation and other operations. |
Original Redmine Comment P.S. Feel free to extend the test I sent to add operator-on-left outputs. You'll need to update the signature; I assume you have another simulator to check against. |
Original Redmine Comment I I found out why my %li isn't working. In V3Emit.cpp, the following code exists inside the @visit@ function for AstNodeAssign objects:
In the case where we're assigning to a wide variable, this branch of the if statement evaluates to true. As a result, it's assuming that the emitC outputs a function that writes into directly into the destination wide variable. My problem case was effectively the following:
So when the STREAMR emits simply %lh, the if statement snippet doesn't wrap it up in a VL_ASSIGN_W. This means that I'm going to have to output one of the VL_ASSIGN_xxx functions myself from my emitC function. |
Original Redmine Comment I've pushed several more fixes, although I'd appreciate your input on two commits:
|
Original Redmine Comment A couple more bugs have been fixed and I've updated the test with more test cases. |
Original Redmine Comment If you could look at optimizing at least the power-of-2-bit-flip cases, then I'll merge. You may need some additional tests to make sure each are covered. For example, roughly and untested: switch rhs all of these are 32 bits, copy and modify appropriately for the 64 bit version. |
Original Redmine Comment P.S. no reason to optimize V3Number, just verilated.h. |
Original Redmine Comment I've pushed a patch with handles the power of 2 case. I chose to split the "fast" code into a separate function -- the emitC function decides whether to use the fast or slow versions of the function. I also added a couple of tests to ensure the fast and slow cases are exercised. There is one performance issue that I'd like to fix if at all possible. In the C code that is output, verilator is eliminating variable assignments and is instead issuing multiple calls to VL_STREAML_xxx functions. This problem is seen in the attached t_stream4.v file. The relevant parts are:
Verilator is eliminating dout_1 and is instead placing two calls to VL_STREAM_III in my code: one for the $display, and one for the comparison. I know that dout_1 is being eliminated in V3Gate but I haven't yet had a chance to thoroughly review and understand the logic in this file. Is this a problem that you can put your finger on quickly? |
Original Redmine Comment Great. At present there isn't a way to eliminate pushing functions of one variable (in V3Gate). The "right thing" in this case is not to keep the wire, but make a temporary before those statements, but this isn't something it does at present because the compilers are generally pretty good at eliminating common subexpressions, but cannot push functions across wires (obviously), so that's something Verilator should do. Ready for merge? |
Original Redmine Comment There's one additional thing I should correct before merge. A sentence in 11.4.14 reads:
The attached files should fail compilation but they pass. I'm struggling as to where to place this check. AstCast and AstCastSize objects may be removed during a pass through V3Width. Therefore, the test for a cast must be performed before such nodes are removed. I don't think any of the stages before the first V3Width pass are appropriate for this test. I could try to put it as part of the V3Width checks, however the problem is that AstCast/AstCastSize objects are removed during tree iteration, making it impossible for me to know when visiting an AstNodeStream object if an AstCast/CastSize used to be a parent. Any suggestions would be appreciated. |
Original Redmine Comment
In other words, operating on an unpacked array is illegal. In "if (vup->c()->final())" (i.e. data types are finalized), you can simply check for this with !nodep->dtypepSkipRefp()->castBasicDTypep(). You don't need to worry about casts; they just affect propagating the data types down to your V3Width visitor, then are safely removed as the dtype is now corrected. |
Original Redmine Comment I've pushed a couple more minor commits and I'm now happy to merge. I wasn't able to get checking using dtypeSkipRefp to work (or even dtypep()->skipRefp()), but I don't think it's too critical. |
Original Redmine Comment Two nits, then done. If you merge master you'll see I added some EXTEND_ types to fix a signed bug. Rather than special case NodeAssign in widthCheck, please make a new EXTEND_OFF, use EXTEND_OFF in what nos is the "if (!assignp) ... fixWidthExtend" then pass EXTEND_OFF as appropriate from "visit (AstNodeAssign". If you prefer I'll just make the edit. I think
can be
and similar in the _III version. The main advantage being the elimination of the modulus operator; which in theory should get constant propagated and strength reduced, but I always try to stomp them out anyways just in case. All tests pass with this, so if this isn't correct perhaps a new test is needed. BTW if you have any large regressions I'd appreciate running master against it, as other changes I made could use more runtime as they fix errors only when WIDTH warnings are present and turned off, which is not the case with most of our code. Thanks. |
Original Redmine Comment I've rebased my changes on top of master and pushed to github. The strange shift-merge code I had at the end of the STREAM_FAST functions is necessary, although it was incorrect. (Operator precedence caused the subtract to execute before the shift in finalMask. The ~finalMask should have been ~(rd-1) in the remShift line). The reason for this strange looking shift-or operation is as follows. For the sake of illustration, assume we're working with an 8-bit data type, instead of 32/64 bit values. Assume I have a 5-bit variable inside the 8-bit data type which is set to the value @5'b11111@, and assume @rd = 2@ (the slice size). Here's what happens after the bit shuffle:
To create the correct output value, I want to right shift everything by (8 - lbits), except for the partial slice which needs to be shifted by (8 - rd * ceil(lbits / rd)):
I changed V3Width to implement what I think you were asking. Let me know if it's not correct or feel free to correct it yourself. Unfortunately I don't have any large regressions to test my code against at the moment. |
Original Redmine Comment I thought about your testing comment and wrote the attached. It core-dumped another simulator, but turned up what I think is an issue; running that single test elsewhere showed the expected value seems correct. Also, if you could please change verilated.h to use VL_UL(1) or VL_ULL(1) where you have "1 <<" for portability. Also pass $clog2(rd) into the fast version as rd_log2 instead of rd; then the case statement can use that (case 4 becomes case 2) and the modulus can become just lbits & VL_MASK_I(rd_log2). Then good to go. |
Original Redmine Comment How do I pass the output of $clog2(rd) into my fast function? Is it possible to modify a value as it's being emitted as C? The emitC function currently contains this:
I could replace the @%ri@ parameter with @VL_CLOG2_I(%ri)@ but this will cause the log to be calculated at runtime instead of compile time. |
Original Redmine Comment You could edit the string to put a constant value inside it, e.g. (string)"VL_STREAML_FAST..."+cvtToStr(int_value_to_print)+");" I usually try to keep the class based code small though, so the nicer alternative is to create a visitor in V3EmitC.cpp, and you can print whatever you want with all flexibility including calling the function that decodes the %l style strings. Note right now it is using V3EmitC's visit AstNodeBiop code, by inheritance. |
Original Redmine Comment Ah yes, I should have thought of creating a custom visit function in V3EmitC. I've done this now and tested it against my t_stream.v and your t_stream3.v and it passes. (Pushed to repo.) By the way, the strange-looking shift-merge at the end of the STREAML_FAST functions can actually be replaced by pre-shifting the MSBs in the input data word. To do so, I'd put this code before the switch statement:
Then all you'd need to do at the end is the right shift by 32-lbits. Do you think this code is clearer/easier to understand or should I stick with what I've got? |
Original Redmine Comment Seems nice to me, but either way. BTW your 1 << needs to be VL_UL(1) or VL_ULL(1) for the quad version. |
Original Redmine Comment I decided the pre-shifting is easier to understand (and to explain). I've pushed this into both the III and QII versions. |
Original Redmine Comment Excellent. Made a few trivial style changes, and fixed a compiler warning. Fixed in git towards 3.857. |
Original Redmine Comment In 3.860. |
Author Name: Krzysztof Jankowski
Original Redmine Issue: 649 from https://www.veripool.org
Original Date: 2013-05-21
Original Assignee: Glen Gibb
This is more a feature request rather than bug. Today SV's {<<{...}} and {>>{...}} operators generate syntax error, see attached example. More info about streaming operators can be found here http://testbench.in/SV_20_OPERATORS_2.html.
The text was updated successfully, but these errors were encountered: