Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for streaming operators #649

Closed
veripoolbot opened this issue May 21, 2013 · 30 comments
Closed

support for streaming operators #649

veripoolbot opened this issue May 21, 2013 · 30 comments
Labels
resolution: fixed Closed; fixed type: feature-IEEE Request to add new feature, described in IEEE 1800

Comments

@veripoolbot
Copy link
Contributor


Author Name: Krzysztof Jankowski
Original Redmine Issue: 649 from https://www.veripool.org
Original Date: 2013-05-21
Original Assignee: Glen Gibb


This is more a feature request rather than bug. Today SV's {<<{...}} and {>>{...}} operators generate syntax error, see attached example. More info about streaming operators can be found here http://testbench.in/SV_20_OPERATORS_2.html.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2013-05-22T00:31:27Z


I don't personally have a huge interest in this as it's uncommon syntax, but if you or someone else would like to take it on the patch would of course be welcome.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-03-31T23:35:51Z


I've been wanting stream operator support for a little while to reverse bit sequences (and array sequences). I've gone ahead and added support for these operators myself. The code is currently available in my github repo (https://github.com/grg/verilator) in branch #�

Please merge this into the main Verilator repo if you are happy with the changes.

Two questions I do have regarding my implementation:

  1. I've created three new node types: AstStream (an abstract base class), AstStreamL (stream left operator) and AstStreamR (stream right operator). Right now I've placed AstStream (the abstract base class) in V3AstNodes.h -- is it preferred that this be in V3Ast.h since it is not a node itself?
  2. Stream operators allow the specification of a slice size (e.g., @{ << 2 { 4'b1101}}@) where 2 is the slice size. Should a warning be generated if the slice size exceeds the size of the thing being streamed?

If changes need to be made in response to these questions, should I make the changes or is someone on the Verilator team happy to make them?

By the way, a specification of stream operator behavior is found in section 11.4.14 of the IEEE 1800-2009 standard.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-01T02:58:19Z


You made tests, too! This makes me so happy :)

This is very good for such a large patch. Only minor things pop up:

+struct AstStream : public AstNodeBiop {

This should be a AstNodeStream, and as you guessed in V3Ast.h.

+++ src/V3Number.cpp
+    if (!lhs.sized() || !rhs.sized()) {
+       m_fileline->v3warn(WIDTHCONCAT,"Unsized numbers/parameters not allowed in concatenations.");
+    }
+++ b/src/V3Width.cpp
+       if (vup->c()->final()) {
+           if (!nodep->dtypep()->widthSized()) {
+               // See also error in V3Number
+               nodep->v3warn(WIDTHCONCAT,"Unsized numbers/parameters not allowed in streams.");
+           }
+       }

This rule is specific to concats, so unless IEEE says that unsized is not
allowed for streams, then these lines should be removed. (And if it is not allowed
the first warning text should say streams.)

+++ b/src/V3AstNodes.h
+    virtual string emitVerilog() { return "%f{ << %l %k%r}"; }
+    virtual string emitVerilog() { return "%f{ >> %l %k%r}"; }

I think the %l and %r here are swapped, if I'm reading your lhs/rhs code right.

Also are there more checks needed in the code for unsupported parts of the streaming operators?

BTW I'm deep in a couple of other changes, but hope to release later this week, if you want to get it in by then.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-01T11:08:45Z


I looked closer at the verilated.h code:

IData mask = (2 << (rd - 1)) - 1;
QData mask = (2 << (rd - 1)) - 1;

Should be

  IData mask = VL_MASK_I(rd);
  QData mask = VL_MASK_Q(rd);

As otherwise you will get wrong values with 32/64 as an input.

Also having a for() loop there pains me as this is very expensive for more than a few bits wide. There are recursive algorithms that would be faster (swap adjacent pairs, then double-pairs, then quad-pairs, etc. Just need to be careful about wrap-around.)

Additionally at a minimum, I think we may want to special case the swap-bits (stream-1) case as it's likely the most common; there are much faster algorithms similar to how VL_REDXOR is done, for example:

http://www.hackersdelight.org/hdcodetxt/reverse.c.txt

(BTW Hacker's Delight is a great book.)

Note you need an extra shift right at the end as the result will be at the MSBs, but we may want only say a 5-bit result.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-01T21:12:08Z


Hi Wilson,

In regards to the unsized test, Section 11.4.14.1 states:

Each stream_expression within the stream_concatenation, starting with the leftmost and proceeding from left to right through the comma-separated list of stream_expressions, is converted to a bit-stream and appended to a packed array (stream) of bits...
The text on the streaming operator doesn't explicitly disallow unsized values. However, since it performs a concatenation, I'm assuming that each element should be sized.

I've made two changes to the test in V3Number:

  1. I've removed the check on the rhs expression (this is the slice size)
  2. I updated the error message to refer to streaming not concatenation.

The changes I made to V3Width were based on examples from the visit functions for AstConcat and AstReplicate (hence the concat code). Is it actually necessary for me to call @nodep->dtypeSetLogicSized(...)@ within the visit function? (What would the AstNodeStream's dtype be prior to this call in V3Width?)

I've pushed the remaining changes except for the for loop optimization. I'm thinking a little more about that.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-01T23:56:54Z


Try the attached, it'll fail a weird way. You'll need to copy t_EXAMPLE.pl to t_stream2.pl.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-02T02:41:02Z


Good news is that I know what the problem is. Of course it's in the one area that I thought was so simple that I didn't need a test :(

The problem occurs right now when the right stream operator @{ >> { } }@ appears on the right-hand side of the assignment statement. In this case, the stream operator is a no-op and the thing inside the inner pair of braces should be assigned to the variable on the LHS of the assignment statement. The correct thing for me to do is to remove the AstStreamR operation in this case, which I'll go and do.

One thing I'm a little surprised at is that I thought my emitC function in AstStreamR would take care of this. Right now it simply returns the string "%li" -- I had expected this would result in the lhs of the AstStreamR node being spat out in the C code (which would have been on the RHS of a C assignment statement). What would the correct emitC string be if I simply wanted to return a node's child unmodified?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-02T02:49:04Z


Alright, I pushed a modified version of my test that tests this case and a fix. (Although I'd still like your advice about what to put in the AstStreamR emitC function.)

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-02T03:04:13Z


%li should work fine, I use that other places. It's still much better in V3Const to remove the operator, as this then enables further constant propagation and other operations.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-02T03:05:46Z


P.S. Feel free to extend the test I sent to add operator-on-left outputs. You'll need to update the signature; I assume you have another simulator to check against.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-02T03:45:14Z


I I found out why my %li isn't working. In V3Emit.cpp, the following code exists inside the @visit@ function for AstNodeAssign objects:

	} else if (nodep->isWide()
		   && nodep->lhsp()->castVarRef()
		   && !nodep->rhsp()->castCMath()
		   && !nodep->rhsp()->castVarRef()
		   && !nodep->rhsp()->castArraySel()) {
	    // Wide functions assign into the array directly, don't need separate assign statement
	    m_wideTempRefp = nodep->lhsp()->castVarRef();
	    paren = false;

In the case where we're assigning to a wide variable, this branch of the if statement evaluates to true. As a result, it's assuming that the emitC outputs a function that writes into directly into the destination wide variable.

My problem case was effectively the following:

NODEASSIGN
  lhs -> VARREF, wide
  rhs -> STREAMR

So when the STREAMR emits simply %lh, the if statement snippet doesn't wrap it up in a VL_ASSIGN_W. This means that I'm going to have to output one of the VL_ASSIGN_xxx functions myself from my emitC function.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-02T06:32:41Z


I've pushed several more fixes, although I'd appreciate your input on two commits:

  • f94f299: this commit moves the { << { } } streaming operator from the LHS to the RHS of an assignment. Would you agree that it shouldn't matter if you perform the bit swapping before the assignment or as you're performing the assignment?
  • 10e14d3: Section 11.4.14.3 states "If the source expression contains more bits than are needed, the appropriate number of bits shall be consumed from its left (most significant) end." This disables the width adjustment performed in V3Width.cpp for the special case of a NodeAssign with a NodeStream on LHS. The width adjustment is delayed until optimizations in V3Const.cpp. Is this the best place to disable the width adjustment?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-02T21:41:35Z


A couple more bugs have been fixed and I've updated the test with more test cases.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-04T02:20:02Z


If you could look at optimizing at least the power-of-2-bit-flip cases, then I'll merge. You may need some additional tests to make sure each are covered.

For example, roughly and untested:

switch rhs
case 1:
x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); // FALLTHRU
case 2:
x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); // FALLTHRU
case 4:
x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); // FALLTHRU
case 8:
x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); // FALLTHRU
case 16:
x = ((x >> 16) | (x << 16));
return x >> (32-width)
default:
your loop.

all of these are 32 bits, copy and modify appropriately for the 64 bit version.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-04T02:20:53Z


P.S. no reason to optimize V3Number, just verilated.h.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-04T06:46:45Z


I've pushed a patch with handles the power of 2 case. I chose to split the "fast" code into a separate function -- the emitC function decides whether to use the fast or slow versions of the function. I also added a couple of tests to ensure the fast and slow cases are exercised.

There is one performance issue that I'd like to fix if at all possible. In the C code that is output, verilator is eliminating variable assignments and is instead issuing multiple calls to VL_STREAML_xxx functions. This problem is seen in the attached t_stream4.v file. The relevant parts are:

always @*
begin
     dout_1 = { << { din_1}};
     // assignments to dout_2 and dout_3
end

always @(posedge clk)
begin
     // some code
     $display("%x", dout_1); if (dout_1 != ...
     // more code
end

Verilator is eliminating dout_1 and is instead placing two calls to VL_STREAM_III in my code: one for the $display, and one for the comparison. I know that dout_1 is being eliminated in V3Gate but I haven't yet had a chance to thoroughly review and understand the logic in this file. Is this a problem that you can put your finger on quickly?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-04T10:51:57Z


Great. At present there isn't a way to eliminate pushing functions of one variable (in V3Gate). The "right thing" in this case is not to keep the wire, but make a temporary before those statements, but this isn't something it does at present because the compilers are generally pretty good at eliminating common subexpressions, but cannot push functions across wires (obviously), so that's something Verilator should do.

Ready for merge?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-04T21:50:33Z


There's one additional thing I should correct before merge. A sentence in 11.4.14 reads:

It shall be an error to use a streaming_concatenation as an operand in an expression without first casting it to a bit-stream type.

The attached files should fail compilation but they pass.

I'm struggling as to where to place this check. AstCast and AstCastSize objects may be removed during a pass through V3Width. Therefore, the test for a cast must be performed before such nodes are removed. I don't think any of the stages before the first V3Width pass are appropriate for this test. I could try to put it as part of the V3Width checks, however the problem is that AstCast/AstCastSize objects are removed during tree iteration, making it impossible for me to know when visiting an AstNodeStream object if an AstCast/CastSize used to be a parent. Any suggestions would be appreciated.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-04T22:28:34Z


It shall be an error to use a streaming_concatenation as an operand in an expression without first casting it to a bit-stream type.

In other words, operating on an unpacked array is illegal. In "if (vup->c()->final())" (i.e. data types are finalized), you can simply check for this with !nodep->dtypepSkipRefp()->castBasicDTypep().

You don't need to worry about casts; they just affect propagating the data types down to your V3Width visitor, then are safely removed as the dtype is now corrected.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-07T21:14:57Z


I've pushed a couple more minor commits and I'm now happy to merge.

I wasn't able to get checking using dtypeSkipRefp to work (or even dtypep()->skipRefp()), but I don't think it's too critical.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-08T02:30:18Z


Two nits, then done.

If you merge master you'll see I added some EXTEND_ types to fix a signed bug. Rather than special case NodeAssign in widthCheck, please make a new EXTEND_OFF, use EXTEND_OFF in what nos is the "if (!assignp) ... fixWidthExtend" then pass EXTEND_OFF as appropriate from "visit (AstNodeAssign". If you prefer I'll just make the edit.

I think

 QData finalMask = 1ULL << ((lbits % rd) - 1);
 uint32_t remShift = ((sizeof(QData) * 8) - lbits) & ~finalMask;
 return (ret >> ((sizeof(QData) * 8) - lbits)) | ((ret >> remShift) & finalMask);

can be

 return (ret >> (64-lbits)) & VL_MASK_Q(lbits);

and similar in the _III version. The main advantage being the elimination of the modulus operator; which in theory should get constant propagated and strength reduced, but I always try to stomp them out anyways just in case. All tests pass with this, so if this isn't correct perhaps a new test is needed.

BTW if you have any large regressions I'd appreciate running master against it, as other changes I made could use more runtime as they fix errors only when WIDTH warnings are present and turned off, which is not the case with most of our code.

Thanks.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-08T06:14:14Z


I've rebased my changes on top of master and pushed to github.

The strange shift-merge code I had at the end of the STREAM_FAST functions is necessary, although it was incorrect. (Operator precedence caused the subtract to execute before the shift in finalMask. The ~finalMask should have been ~(rd-1) in the remShift line). The reason for this strange looking shift-or operation is as follows.

For the sake of illustration, assume we're working with an 8-bit data type, instead of 32/64 bit values. Assume I have a 5-bit variable inside the 8-bit data type which is set to the value @5'b11111@, and assume @rd = 2@ (the slice size). Here's what happens after the bit shuffle:

                          orig: 8'b_0001_1111
            ret (post-shuffle): 8'b_1111_0100

To create the correct output value, I want to right shift everything by (8 - lbits), except for the partial slice which needs to be shifted by (8 - rd * ceil(lbits / rd)):

            ret >> (8 - lbits): 8'b_0001_1110
(ret >> remShift) & finalMask: 8'b_0000_0001

I changed V3Width to implement what I think you were asking. Let me know if it's not correct or feel free to correct it yourself.

Unfortunately I don't have any large regressions to test my code against at the moment.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-09T00:23:18Z


I thought about your testing comment and wrote the attached. It core-dumped another simulator, but turned up what I think is an issue; running that single test elsewhere showed the expected value seems correct.

Also, if you could please change verilated.h to use VL_UL(1) or VL_ULL(1) where you have "1 <<" for portability. Also pass $clog2(rd) into the fast version as rd_log2 instead of rd; then the case statement can use that (case 4 becomes case 2) and the modulus can become just lbits & VL_MASK_I(rd_log2).

Then good to go.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-09T00:49:14Z


How do I pass the output of $clog2(rd) into my fast function? Is it possible to modify a value as it's being emitted as C? The emitC function currently contains this:

	// Attempt to use a "fast" stream function for slice size = power of 2
	if (!isWide()) {
	    uint32_t isPow2 = rhsp()->castConst()->num().countOnes() == 1;
	    uint32_t sliceSize = rhsp()->castConst()->toUInt();
	    if (isPow2 && sliceSize <= (isQuad() ? sizeof(uint64_t) : sizeof(uint32_t))) {
		return "VL_STREAML_FAST_%nq%lq%rq(%nw,%lw,%rw, %P, %li, %ri)";
	    }
	}
	return "VL_STREAML_%nq%lq%rq(%nw,%lw,%rw, %P, %li, %ri)";

I could replace the @%ri@ parameter with @VL_CLOG2_I(%ri)@ but this will cause the log to be calculated at runtime instead of compile time.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-09T01:43:23Z


You could edit the string to put a constant value inside it, e.g.

(string)"VL_STREAML_FAST..."+cvtToStr(int_value_to_print)+");"

I usually try to keep the class based code small though, so the nicer alternative is to create a visitor in V3EmitC.cpp, and you can print whatever you want with all flexibility including calling the function that decodes the %l style strings. Note right now it is using V3EmitC's visit AstNodeBiop code, by inheritance.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-09T21:11:58Z


Ah yes, I should have thought of creating a custom visit function in V3EmitC. I've done this now and tested it against my t_stream.v and your t_stream3.v and it passes. (Pushed to repo.)

By the way, the strange-looking shift-merge at the end of the STREAML_FAST functions can actually be replaced by pre-shifting the MSBs in the input data word. To do so, I'd put this code before the switch statement:

     if (rd_log2) {
         uint32_t lbitsFloor = lbits & ~VL_MASK_I(rd_log2); // Largest integer multiple of rd <= lbits
         uint32_t lbitsRem = lbits - lbitsFloor;
         uint32_t msbMask = VL_MASK_I(lbitsRem) << lbitsFloor;
         ret = (ret & ~msbMask) | ((ret & msbMask) << ((1 << rd_log2) - lbitsRem));
     }

Then all you'd need to do at the end is the right shift by 32-lbits. Do you think this code is clearer/easier to understand or should I stick with what I've got?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-09T21:20:16Z


Seems nice to me, but either way.

BTW your 1 << needs to be VL_UL(1) or VL_ULL(1) for the quad version.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Glen Gibb
Original Date: 2014-04-09T21:39:25Z


I decided the pre-shifting is easier to understand (and to explain). I've pushed this into both the III and QII versions.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-04-10T00:30:41Z


Excellent. Made a few trivial style changes, and fixed a compiler warning.

Fixed in git towards 3.857.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-05-11T21:09:49Z


In 3.860.

@veripoolbot veripoolbot added resolution: fixed Closed; fixed type: feature-IEEE Request to add new feature, described in IEEE 1800 labels Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution: fixed Closed; fixed type: feature-IEEE Request to add new feature, described in IEEE 1800
Projects
None yet
Development

No branches or pull requests

1 participant