Project

General

Profile

[logo] 
 
Home
News
Activity
About/Contact
Major Tools
  Dinotrace
  Verilator
  Verilog-mode
  Verilog-Perl
Other Tools
  BugVise
  CovVise
  Force-Gate-Sim
  Gspice
  IPC::Locker
  Rsvn
  SVN::S4
  Voneline
  WFH
General Info
  Papers

Issue #780

Preprocessor incorrectly parses define arguments

Added by Derek Lockhart over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
High
Category:
Parser
% Done:

0%


Description

Calling define macros that take multiple arguments will be parsed incorrectly depending on the indentation. The following code shows the error:

`define TESTA(cond, msg, arg1, arg2) \
  $display("HELLO %s %d %d", msg, arg1, arg2 )

`define TESTB(con, msg, arg1, arg2) \
  $display("HELLO %s %d %d", msg, arg1, arg2 )

module vc_RDFF_pf #( parameter W = 1, parameter RESET_VALUE = 0 ) 
(
  input              clk,      // Clock input
  input              reset_p,  // Sync reset input (sampled on rising edge)
  input      [W-1:0] d_p,      // Data input (sampled on rising clk edge)
  output reg [W-1:0] q_np      // Data output
);

  always @( posedge clk )
  begin
    q_np <= reset_p ? RESET_VALUE : d_p;

    // PASSES
    `TESTA( q_np,
            "some_msg", d_p, RESET_VALUE);

    // FAILS
    `TESTA( q_np,
             "some_msg", d_p, RESET_VALUE);

    // PASSES
    //`TESTB( q_np,
    //         "some_msg", d_p, RESET_VALUE);
  end 

endmodule

Note that the difference between the first call to TESTA and the second call to TESTA differ only by a single space on the second line. Also note that TESTB when called also passes, and differs to TESTA only by the name of the first variable, which is shorter by one character. This is a very difficult bug to catch!

This single space difference between the two calls to TESTA results in the following error:

%Error: TEST.v:29: Define passed too many arguments: TESTA
%Error: TEST.v:31: Expecting ( to begin argument list for define reference `TESTA
%Error: TEST.v:39: EOF in define argument list
%Error: TEST.v:40: EOF in define argument list
%Error: TEST.v:40: syntax error, unexpected $end
%Error: Cannot continue

Adding printfs to V3PreProc.cpp to display refp->name() and refp->args() generates the following output:

line 553 refp->name():   TESTA
line 553 refp->args() [0]  q_np
line 553 refp->args() [1]             "some_msg" 
line 553 refp->args() [2]  d_p
line 553 refp->args() [3]  RESET_VALUE
line 553 refp->name():   TESTA
line 553 refp->args() [0]  q_np
line 553 refp->args() [1]              "some_msg" 
line 553 refp->args() [2]  d_p
line 553 refp->args() [3]  RESET_VALUE
FAIL!!!
line 624 refp->name():   TESTA
line 624 refp->args():     4
line 624 numArgs:          2

output.txt View - Output of "bin/verilator -E file.v --debug --debugi-V3PreShell 9" (21.1 KB) Derek Lockhart, 06/06/2014 01:15 PM

issue-780-patch.diff View (2.86 KB) Derek Lockhart, 06/08/2014 02:58 PM

History

#1 Updated by Wilson Snyder over 5 years ago

This works for me. This smells like a flex bug as spacing is pretty opaque to the Verilator code.

What OS are you on? What does flex --version print?

Can you attach the output of

bin/verilator -E file.v --debug --debugi-V3PreShell 9

#2 Updated by Wilson Snyder over 5 years ago

  • Status changed from New to AskedReporter

#3 Updated by Derek Lockhart over 5 years ago

Wilson Snyder wrote:

This works for me. This smells like a flex bug as spacing is pretty opaque to the Verilator code.

What OS are you on? What does flex --version print?

Can you attach the output of

bin/verilator -E file.v --debug --debugi-V3PreShell 9

I'm am on OSX Mavericks, using the builtin version of flex and clang:

$ flex --version
  flex 2.5.35 Apple(flex-31)

$ gcc --version                 
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1                    
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)                  
Target: x86_64-apple-darwin13.1.0 

Note that I had previously tried to compile with a newer version of flex installed via MacPorts, but Verilator would not compile with that due to the error described here:

- https://bugs.launchpad.net/zorba/+bug/1034582

Output of "bin/verilator -E file.v --debug --debugi-V3PreShell 9" is attached.

#4 Updated by Wilson Snyder over 5 years ago

 - V3PreProc.cpp:552:  defineSubstIn  `TESTA (cond, msg, arg1, arg2)
 ...
 - V3PreProc.cpp:580:      Got Arg=0  argName='cond'  default=''
 - V3PreProc.cpp:580:      Got Arg=1  argName='msg'  default=''
 %Error: TEST.v:29: Define passed too many arguments: TESTA

Compare this to the text above which passed and this didn't print "Got Arg"=2/3. If you go to line V3PreProc.cpp you'll see this is just looping over the text that was printed in defineSubstIn which does have 4 arguments. So it's some sort of corruption issue. You'll need to add prints or use valgrind or similar to figure it out at your site since I don't see it.

#5 Updated by Derek Lockhart over 5 years ago

It seems the first visit to defineSubst parses the arguments of the definition fine, but the second visit to defineSubst only parses the first two arguments of the definition correctly. The third argument it is getting the argument value provided in the call, not the argument variable in the definition!

defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '            "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2)
    Got Arg=0  argName='cond'  default=''
   Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2)
    Got Arg=1  argName='msg'  default=''
   Parse  Paren=1  Arg=2  token=''  Parse= arg1, arg2)
   Parse  Paren=1  Arg=2  token=' '  Parse=arg1, arg2)
   Parse  Paren=1  Arg=2  token=' a'  Parse=rg1, arg2)
   Parse  Paren=1  Arg=2  token=' ar'  Parse=g1, arg2)
   Parse  Paren=1  Arg=2  token=' arg'  Parse=1, arg2)
   Parse  Paren=1  Arg=2  token=' arg1'  Parse=, arg2)
    Got Arg=2  argName='arg1'  default=''
   Parse  Paren=1  Arg=3  token=''  Parse= arg2)
   Parse  Paren=1  Arg=3  token=' '  Parse=arg2)
   Parse  Paren=1  Arg=3  token=' a'  Parse=rg2)
   Parse  Paren=1  Arg=3  token=' ar'  Parse=g2)
   Parse  Paren=1  Arg=3  token=' arg'  Parse=2)
   Parse  Paren=1  Arg=3  token=' arg2'  Parse=)
    Got Arg=3  argName='arg2'  default=''
[] Checking sizes: 4 4 
defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '             "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2)
    Got Arg=0  argName='cond'  default=''
   Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2)
   Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2)
    Got Arg=1  argName='msg'  default=''
   Parse  Paren=1  Arg=2  token=''  Parse=  "some_msg" 
   Parse  Paren=1  Arg=2  token=' '  Parse= "some_msg" 
   Parse  Paren=1  Arg=2  token='  '  Parse="some_msg" 
   Parse  Paren=1  Arg=2  token='  "'  Parse=some_msg" 
   Parse  Paren=1  Arg=2  token='  "s'  Parse=ome_msg" 
   Parse  Paren=1  Arg=2  token='  "so'  Parse=me_msg" 
   Parse  Paren=1  Arg=2  token='  "som'  Parse=e_msg" 
   Parse  Paren=1  Arg=2  token='  "some'  Parse=_msg" 
   Parse  Paren=1  Arg=2  token='  "some_'  Parse=msg" 
   Parse  Paren=1  Arg=2  token='  "some_m'  Parse=sg" 
   Parse  Paren=1  Arg=2  token='  "some_ms'  Parse=g" 
   Parse  Paren=1  Arg=2  token='  "some_msg'  Parse=" 
[] Checking sizes: 4 2 

#6 Updated by Derek Lockhart over 5 years ago

It looks like the problem is in the trimWhitespace function in V3PreProc

 592           printf("    before:  %s\n", cp );
 593           string arg = trimWhitespace(refp->args()[numArgs], true);
 594           printf("    after:  %s\n", cp );
This outputs
    before:  , arg1, arg2)
    after: 

Why should the version of flex would affect this?

#7 Updated by Wilson Snyder over 5 years ago

Why should the version of flex would affect this?

It shouldn't, that's was just a guess.

Obviously you still haven't shown why it's messed up - presumably it worked correctly in the earlier call to that function.

#8 Updated by Derek Lockhart over 5 years ago

As a quick aside, I tried recompiling Verilator on OSX Mavericks using macports gcc and macports flex and I no longer encountered this bug on the provided test case.

$ gcc --version
gcc (MacPorts gcc48 4.8.2_0) 4.8.2

$ flex --version
flex 2.5.37
Note that macports gcc won't compile without the macports flex, it throws a Flex header error several minutes into the compile:
V3Lexer_pregen.yy.cpp:384:23: fatal error: FlexLexer.h: No such file or directory
 #include <FlexLexer.h>

This does seem to strongly hint the problem is in flex, but I will poke into trimWhitespace a bit more.

#9 Updated by Derek Lockhart over 5 years ago

Inside of V3PreProcImp::trimWhitespace() there is a dynamic allocation of a string variable named out. During the processing of the third parameter of the second call to macro `TESTA , the memory that variable out is allocated overlaps the memory pointed to by pointer cp in V3PreProcImp::defineSubst(). This overwrites the contents of the cp variable, which is supposed to contain the macro definition. Instead, the memory pointed to by cp now contains the out string, which happens to be the third parameter provided during the macro call.

defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '             "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
   Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406591 ]
   Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406592 ]
   Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406593 ]
   Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406594 ]
   Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406595 ]
       trimWhitespace: out addr[ 0x7fff52e6b8e9 to 0x7fff52e6b8ec ]
    Got Arg=0  argName='cond'  default=''
       trimWhitespace: out addr[ 0x7fff52e6b8b9 to 0x7fff52e6b8bd ]
   Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406596 ]
   Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406597 ]
   Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2) <>  cp addr[ 0x7fe2ab406598 ]
   Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2) <>  cp addr[ 0x7fe2ab406599 ]
   Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2) <>  cp addr[ 0x7fe2ab40659a ]
       trimWhitespace: out addr[ 0x7fff52e6b8e9 to 0x7fff52e6b8ec ]
    Got Arg=1  argName='msg'  default=''
       trimWhitespace: out addr[ 0x7fe2ab406590 to 0x7fe2ab4065a6 ]         <------ RANGE OVERLAPS cp addr !!!
   Parse  Paren=1  Arg=2  token=''  Parse=  "some_msg" <>  cp addr[ 0x7fe2ab40659b ]
   Parse  Paren=1  Arg=2  token=' '  Parse= "some_msg" <>  cp addr[ 0x7fe2ab40659c ]
   Parse  Paren=1  Arg=2  token='  '  Parse="some_msg" <>  cp addr[ 0x7fe2ab40659d ]
   Parse  Paren=1  Arg=2  token='  "'  Parse=some_msg" <>  cp addr[ 0x7fe2ab40659e ]
   Parse  Paren=1  Arg=2  token='  "s'  Parse=ome_msg" <>  cp addr[ 0x7fe2ab40659f ]
   Parse  Paren=1  Arg=2  token='  "so'  Parse=me_msg" <>  cp addr[ 0x7fe2ab4065a0 ]
   Parse  Paren=1  Arg=2  token='  "som'  Parse=e_msg" <>  cp addr[ 0x7fe2ab4065a1 ]
   Parse  Paren=1  Arg=2  token='  "some'  Parse=_msg" <>  cp addr[ 0x7fe2ab4065a2 ]
   Parse  Paren=1  Arg=2  token='  "some_'  Parse=msg" <>  cp addr[ 0x7fe2ab4065a3 ]
   Parse  Paren=1  Arg=2  token='  "some_m'  Parse=sg" <>  cp addr[ 0x7fe2ab4065a4 ]
   Parse  Paren=1  Arg=2  token='  "some_ms'  Parse=g" <>  cp addr[ 0x7fe2ab4065a5 ]
   Parse  Paren=1  Arg=2  token='  "some_msg'  Parse=" <>  cp addr[ 0x7fe2ab4065a6 ]

I think the problem is line 596 of V3PreProc.cpp where cp is declared:
 569   const char* cp=refp->params().c_str();

The pointer cp is never allocated any memory, instead it's simply given the reference returned by string::c_str(). The C++ docs indicate that this pointer could potentially be invalidated by other functions that modify the string. A couple posts on StackOverflow seem to indicate that its best to always make a copy of c_str() and not rely on the pointer it returns.

  char * cstr = new char [str.length()+1];
  std::strcpy (cstr, str.c_str());
  • http://www.cplusplus.com/reference/string/string/c_str/
  • http://stackoverflow.com/a/6456408
  • http://stackoverflow.com/a/17403222
  • http://stackoverflow.com/a/8843627

Applying the following fix did indeed solve the problem:

 571   char* cp= new char[refp->params().size()+1];
 572   std::strcpy(cp, refp->params().c_str() );
 573   const char* cp_start = cp;
 574 
 575   if (*cp == '(') cp++;
 576   for (; *cp; cp++) {
  ...
 627   }   
 628   delete[] cp_start;
There is another reference to a c_str() later in the function that should probably be fixed too:
 643   // Note we go through the loop once more at the NULL end-of-string
 644   for (const char* cp=value.c_str(); (*cp) || argName!=""; cp=(*cp?cp+1:cp)) {
...

All that being said, I couldn't find any modifications to refp->params() inside of V3PreProcImp::defineSubst() that would cause it to invalidate the c_str() pointer. In addition, trying to print the pointer location returned by refp->params().c_str() suddenly made the error disappear; as did adding a memory watchpoint in LLDB.

I'm curious if the problem here is actually LLVM; I'm not familiar enough with the C++ spec to know if what is going on here is "reasonable" behavior or a compiler bug.

#10 Updated by Derek Lockhart over 5 years ago

I brought up this issue with developers on #llvm to figure out if this was a compiler bug or not; they noted that in this context cp will certainly reference freed memory.

refp->params() returns a std::string object, which is a copy of the m_params object within V3Define. This means for the following statement:

569   const char* cp=refp->params().c_str(); 

refp->params() returns a std::string object that is destroyed as soon as the c_str() call completes. This means cp will always point to freed memory!

It was mentioned that the correct way to handle this is to keep params alive:
    std::string params = refp->params()
    const char* cp = params.c_str(); 

#11 Updated by Wilson Snyder over 5 years ago

Thanks for debugging! I think a nicer fix would be to use string iterators/ functions. Might you attempt a patch like that too?

Meanwhile I will need to audit the code to make sure I do not have similar issues elsewhere.

#12 Updated by Derek Lockhart over 5 years ago

Wilson Snyder wrote:

Thanks for debugging! I think a nicer fix would be to use string iterators/ functions. Might you attempt a patch like that too?

Meanwhile I will need to audit the code to make sure I do not have similar issues elsewhere.

I've attached a minimal patch that replaces c_str() with a string iterator in defineSubst. I otherwise did not modify the logic, and split some of my changes into two lines if they exceeded the 80 character limit.

I did a quick audit of the rest of the file and also replaced three other instances of "some_object->method_call().c_str()" that were in debug statements.

(Note that diff was created inside of the src directory, so it probably needs to be applied from there?)

#13 Updated by Wilson Snyder over 5 years ago

  • Status changed from AskedReporter to Resolved

In git towards 3.861.

Audited all of the code, and also Verilog-Perl and SystemPerl. Only a few other spots that are potentially problems, but overfixed to never call a function then c_str() anywhere and added a lint check for the future.

Thanks much for debugging this.

#14 Updated by Derek Lockhart over 5 years ago

Wilson Snyder wrote:

In git towards 3.861.

Audited all of the code, and also Verilog-Perl and SystemPerl. Only a few other spots that are potentially problems, but overfixed to never call a function then c_str() anywhere and added a lint check for the future.

Thanks much for debugging this.

No problem! We rely on Verilator pretty heavily, I'm happy to be able to contribute something useful.

#15 Updated by Wilson Snyder over 5 years ago

  • Status changed from Resolved to Closed
  • Assignee set to Derek Lockhart

In 3.862.

Also available in: Atom