Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to one structure element clobbers another #803

Closed
veripoolbot opened this issue Jul 13, 2014 · 14 comments
Closed

Writing to one structure element clobbers another #803

veripoolbot opened this issue Jul 13, 2014 · 14 comments
Labels
area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed

Comments

@veripoolbot
Copy link
Contributor


Author Name: Jeff Bush (@jbush001)
Original Redmine Issue: 803 from https://www.veripool.org
Original Date: 2014-07-13
Original Assignee: Jeff Bush (@jbush001)


I found an issue where I do a nonblocking assignment to one structure member, followed by another (the former is multiple bits and the latter is only one bit), and the first member I assigned is clobbered. This seems to be due to a value larger than 32 being passed into the left shift operator (The _Vdlyvllsb variable is larger than 32 and is not masked). If I manually edit the generated code to mask this value (bitwise and with 31), the code works correctly.

The issue appears to occur where the nonblocking store is applied. Here is the generated code:

     if (__Vdlyvset__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69) {
	vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores[0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
										>> 5U)] 
	    = (vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores
	       [0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
		     >> 5U)] | ((IData)(1U) << __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69));

Note that __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 in this case is 611. At the very end of the last expression, it is the shift amount to create the one bit value. Technically, I think the C spec says the result is undefined, but I'd expect it to mask off the value. I printed the value of that array element before and after this assignment and multiple bits have been set.

Just for yucks, I added a printf at the top of this clause:

     if (__Vdlyvset__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69) {
printf("shifted value is %x shift amount %d", 
	((IData)(1U) << __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69),
	__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69);

It prints this out:

shifted value is 389c00 shift amount 611

Which seems totally bizarre to me. So, it appears the processor or compiler is doing something very odd with this. I'm using MacOS 10.9.3 with Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn).

At any rate, it would probably be more proper to mask the shift value when generating structure assignments.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-13T02:35:28Z


I'm on a 64 bit machine, but it looks like IData is a 32 bit value, so that shouldn't matter.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-07-13T13:38:36Z


This might be #�, please try the version in git.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-13T17:45:40Z


  • I synced to master (with the fix for bug 800) and tried. The bug still occurs.
  • I tried an experiment where I compiled the following program:
const int foo = 611;

int main()
{
     printf("%x\n", (__uint32_t) 1U << foo);
}

The result is:

> ./a.out 
5f7d4c30

It I make the shift amount non-const, it prints 8. Using the -S flag to compile both forms to assembly, I can see that my compiler skips emitting the shift instruction and just leaves an uninitialized result when it knows the shift amount is too big.

So, I still think it is necessary to mask the shift amount within a word. This:

	vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores[0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
										>> 5U)] 
	    = (vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores
	       [0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
		     >> 5U)] | ((IData)(1U) << __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69));

Should be:

	vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores[0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
										>> 5U)] 
	    = (vlTOPp->v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores
	       [0U][(__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 
		     >> 5U)] | ((IData)(1U) << (__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 & 31)));

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-13T20:59:40Z


The following change fixes the problem for me. However, this area of the code is fairly complex and I don't fully understand it, so I suspect this may not handle every case correctly.

diff --git a/src/V3Expand.cpp b/src/V3Expand.cpp
index f2a7fbf..f1b8c52 100644
--- a/src/V3Expand.cpp
+++ b/src/V3Expand.cpp
@@ -583,11 +583,13 @@ private:
 								     newSelBitBit(lhsp->lsbp()),
 								     VL_WORDSIZE)),
 					  oldvalp);
+					  
+		AstNode* maskedshift = new AstAnd(lhsp->fileline(), lhsp->lsbp()->cloneTree(true), new AstConst (nodep->fileline(), 31));
 		AstNode* newp = new AstOr (lhsp->fileline(),
 					   oldvalp,
 					   new AstShiftL (lhsp->fileline(),
 							  rhsp,
-							  lhsp->lsbp()->cloneTree(true),
+							  maskedshift,
 							  VL_WORDSIZE));
 		newp = new AstAssign (nodep->fileline(),
 				      new AstWordSel (nodep->fileline(),

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-07-14T02:49:16Z


V3Premit should be taking care of the overshifting, the V3Expand fix might be incomplete and in other cases redundant speed wise. Can you provide a verilog complete verilog example that breaks?

BTW as to your C example that's how x86 does shifts, the upper bits are ignored, unlike PowerPC for example.

Thanks

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-14T04:39:52Z


Sorry, I don't think I've described the issue very well. :) Unfortunately, all my attempts to reduce my program to a simple test case usually mask the issue, because of the way the compiler (LLVM) generates code. This is definitely compiler specific.

I didn't quite understand your last comment, but, at the risk of repeating something that may already be clear to you, I'll try to give a more detail. As described earlier, I augmented the generated code that handles the delayed assignment to add a printf as follows. It is shifting a single bit the amount described by __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 (which is 611).

     if (__Vdlyvset__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69) {
	printf("inserted value is %x shift amount %d\n", ((IData)(1U) << __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69),
		__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69);

When I run this program, it prints:

inserted value is 5f1d1f28 shift amount 611

Right away, it's clear that something is wrong because shifting a single bit shoudn't ever produce 0x5f1d1f28. If I compile this to assembly, I see the following:

	je	LBB15_1895
## BB#1894:
	leaq	L_.str95(%rip), %rdi
	movl	$611, %edx              ## imm = 0x263
	xorl	%eax, %eax
	callq	_printf

L_.str95:                               ## @.str95
	.asciz	"inserted value is %x shift amount %d\n"

A few things to note here:

  • The compiler has inferred that __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69 is const. You can see it directly encodes the constant 611 in the movl instruction instead of dereferencing the global variable.
  • esi (which should contain the value of the shifted 1 bit) is never initialized and there are no shift instructions at all. You're just seeing whatever happened to be in esi already.

Now I modify the right side of the shift statement to bitwise AND with 31 so it looks like this:

	printf("inserted value is %x shift amount %d\n", ((IData)(1U) << (31 & __Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69)),
		__Vdlyvlsb__v__DOT__core0__DOT__l2_cache_interface__DOT__store_buffer__DOT__pending_stores__v69);

...And I get the following code. Note that esi is now properly initialized to 8:

	je	LBB15_1895
## BB#1894:
	leaq	L_.str95(%rip), %rdi
	movl	$8, %esi
	movl	$611, %edx              ## imm = 0x263
	xorl	%eax, %eax
	callq	_printf

The LLVM blog does explicitly state (in the section 'oversize shift amounts') that they treat this as undefined, which hints that they have coded it this way explicitly (which is what I was also demonstrating in my previous C program):

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-16T17:07:08Z


It's not clear to me how this could be fixed in V3Premit. The V3Premit pass occurs before V3Expand, and the latter appears to be where the AstSel node is converted to shifts/array/logical operations. If the value were masked in V3Premit, the array index would be wrong.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-17T03:58:38Z


Okay, I managed to create a minimal program that reproduces the issue:

module bug(input clk);
	struct packed {
		logic flag;
		logic[130:0] data;
	} foo[1];

	always_ff @(posedge clk)
	begin
		foo[0].data <= 0;
		foo[0].flag <= !foo[0].flag;
		$display("data %x flag %d", foo[0].data, foo[0].flag);
	end
endmodule

Output:

data 000000000000000000000000000000000 flag 0
data 700000000000000000000000000000000 flag 1
data 700000000000000000000000000000000 flag 1
data 700000000000000000000000000000000 flag 1
data 700000000000000000000000000000000 flag 1

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-17T05:19:06Z


Attached proper regression test cases (for the test_regress/t) that reproduce the issue.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-26T15:37:53Z


Attached a patch for this issue with regression test cases. After studying the code for a while, I believe this is the right way to fix the problem. You had some objections about this previously, so I will attempt to clarify why I think this is correct.

Before the expansion, the problematic area is a legal node ASSIGNSEL(varlsb,wide,1bit), with a non-constant shift amount. As the ASSIGNSEL is legal, it is not possible to address the issue before the expand pass. The expand function converts it into a combination of WORDSEL/OR/LEFTSHIFT. At that point, we know the shift amount (lsb) can be more than 32. We also know that using the left shift operator with a number greater than 32 is undefined, so we should mask it. Note that the patch only changes the case where:

  • The value to be assigned is a one bit value
  • The shift amount is a variable can cannot be resolved statically
  • The value is being assigned to a structure that is stored in array form

After the expandpass, the context is lost and it just looks like a leftshift node, so the issue cannot be remedied after that.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-26T16:02:00Z


One other note: when I run the regression tests on master@fe5bf01b, there are six failures on my machine. I re-ran all tests with this patch and confirmed it didn't add any new ones.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-07-28T11:31:39Z


I'm convinced. Thanks much for the research and patch.

BTW I'd be curious what the other regression failures are since they all pass for me, if you would be willing to file another bug and debug them that would help the next person, thanks.

Fixed in git towards 3.863.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Jeff Bush (@jbush001)
Original Date: 2014-07-31T03:45:05Z


I'm on a mac and they appear to be platform issues (a few can't find gdb, one not handling LD flags correctly).

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-09-21T13:11:04Z


In 3.864.

@veripoolbot veripoolbot added area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed labels Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: wrong runtime result Issue involves an incorrect runtine result from Verilated model resolution: fixed Closed; fixed
Projects
None yet
Development

No branches or pull requests

1 participant