Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor incorrectly parses define arguments #780

Closed
veripoolbot opened this issue Jun 6, 2014 · 14 comments
Closed

Preprocessor incorrectly parses define arguments #780

veripoolbot opened this issue Jun 6, 2014 · 14 comments
Labels
area: parser Issue involves SystemVerilog parsing resolution: fixed Closed; fixed

Comments

@veripoolbot
Copy link
Contributor


Author Name: Derek Lockhart
Original Redmine Issue: 780 from https://www.veripool.org
Original Date: 2014-06-06
Original Assignee: Derek Lockhart


Calling define macros that take multiple arguments will be parsed incorrectly depending on the indentation. The following code shows the error:

`define TESTA(cond, msg, arg1, arg2) \
  $display("HELLO %s %d %d", msg, arg1, arg2 )

`define TESTB(con, msg, arg1, arg2) \
  $display("HELLO %s %d %d", msg, arg1, arg2 )


module vc_RDFF_pf #( parameter W = 1, parameter RESET_VALUE = 0 ) 
(
  input              clk,      // Clock input
  input              reset_p,  // Sync reset input (sampled on rising edge)
  input      [W-1:0] d_p,      // Data input (sampled on rising clk edge)
  output reg [W-1:0] q_np      // Data output
);

  always @( posedge clk )
  begin
     q_np <= reset_p ? RESET_VALUE : d_p;

     // PASSES
     `TESTA( q_np,
             "some_msg", d_p, RESET_VALUE);

     // FAILS
     `TESTA( q_np,
              "some_msg", d_p, RESET_VALUE);

     // PASSES
     //`TESTB( q_np,
     //         "some_msg", d_p, RESET_VALUE);
  end 


endmodule

Note that the difference between the first call to TESTA and the second call to TESTA differ only by a single space on the second line. Also note that TESTB when called also passes, and differs to TESTA only by the name of the first variable, which is shorter by one character. This is a very difficult bug to catch!

This single space difference between the two calls to TESTA results in the following error:

%Error: TEST.v:29: Define passed too many arguments: TESTA
%Error: TEST.v:31: Expecting ( to begin argument list for define reference `TESTA
%Error: TEST.v:39: EOF in define argument list
%Error: TEST.v:40: EOF in define argument list
%Error: TEST.v:40: syntax error, unexpected $end
%Error: Cannot continue

Adding printfs to V3PreProc.cpp to display refp->name() and refp->args() generates the following output:

line 553 refp->name():   TESTA
line 553 refp->args() [0]  q_np
line 553 refp->args() [1]             "some_msg"
line 553 refp->args() [2]  d_p
line 553 refp->args() [3]  RESET_VALUE
line 553 refp->name():   TESTA
line 553 refp->args() [0]  q_np
line 553 refp->args() [1]              "some_msg"
line 553 refp->args() [2]  d_p
line 553 refp->args() [3]  RESET_VALUE
FAIL!!!
line 624 refp->name():   TESTA
line 624 refp->args():     4
line 624 numArgs:          2

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-06T12:28:18Z


This works for me. This smells like a flex bug as spacing is pretty opaque to the Verilator code.

What OS are you on? What does flex --version print?

Can you attach the output of

bin/verilator -E file.v --debug --debugi-V3PreShell 9

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-06T13:15:34Z


Wilson Snyder wrote:

This works for me. This smells like a flex bug as spacing is pretty opaque to the Verilator code.

What OS are you on? What does flex --version print?

Can you attach the output of

bin/verilator -E file.v --debug --debugi-V3PreShell 9

I'm am on OSX Mavericks, using the builtin version of flex and clang:

$ flex --version
  flex 2.5.35 Apple(flex-31)

$ gcc --version                 
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1                    
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)                  
Target: x86_64-apple-darwin13.1.0 

Note that I had previously tried to compile with a newer version of flex installed via MacPorts, but Verilator would not compile with that due to the error described here:

Output of "bin/verilator -E file.v --debug --debugi-V3PreShell 9" is attached.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-06T13:32:31Z


 - V3PreProc.cpp:552:  defineSubstIn  `TESTA (cond, msg, arg1, arg2)
 ...
 - V3PreProc.cpp:580:      Got Arg=0  argName='cond'  default=''
 - V3PreProc.cpp:580:      Got Arg=1  argName='msg'  default=''
 %Error: TEST.v:29: Define passed too many arguments: TESTA

Compare this to the text above which passed and this didn't print "Got Arg"=2/3. If you go to line V3PreProc.cpp you'll see this is just looping over the text that was printed in defineSubstIn which does have 4 arguments. So it's some sort of corruption issue. You'll need to add prints or use valgrind or similar to figure it out at your site since I don't see it.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-06T14:49:04Z


It seems the first visit to defineSubst parses the arguments of the definition fine, but the second visit to defineSubst only parses the first two arguments of the definition correctly. The third argument it is getting the argument value provided in the call, not the argument variable in the definition!

defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '            "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2)
     Got Arg=0  argName='cond'  default=''
    Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2)
     Got Arg=1  argName='msg'  default=''
    Parse  Paren=1  Arg=2  token=''  Parse= arg1, arg2)
    Parse  Paren=1  Arg=2  token=' '  Parse=arg1, arg2)
    Parse  Paren=1  Arg=2  token=' a'  Parse=rg1, arg2)
    Parse  Paren=1  Arg=2  token=' ar'  Parse=g1, arg2)
    Parse  Paren=1  Arg=2  token=' arg'  Parse=1, arg2)
    Parse  Paren=1  Arg=2  token=' arg1'  Parse=, arg2)
     Got Arg=2  argName='arg1'  default=''
    Parse  Paren=1  Arg=3  token=''  Parse= arg2)
    Parse  Paren=1  Arg=3  token=' '  Parse=arg2)
    Parse  Paren=1  Arg=3  token=' a'  Parse=rg2)
    Parse  Paren=1  Arg=3  token=' ar'  Parse=g2)
    Parse  Paren=1  Arg=3  token=' arg'  Parse=2)
    Parse  Paren=1  Arg=3  token=' arg2'  Parse=)
     Got Arg=3  argName='arg2'  default=''
[] Checking sizes: 4 4 
defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '             "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2)
     Got Arg=0  argName='cond'  default=''
    Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2)
    Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2)
     Got Arg=1  argName='msg'  default=''
    Parse  Paren=1  Arg=2  token=''  Parse=  "some_msg"
    Parse  Paren=1  Arg=2  token=' '  Parse= "some_msg"
    Parse  Paren=1  Arg=2  token='  '  Parse="some_msg"
    Parse  Paren=1  Arg=2  token='  "'  Parse=some_msg"
    Parse  Paren=1  Arg=2  token='  "s'  Parse=ome_msg"
    Parse  Paren=1  Arg=2  token='  "so'  Parse=me_msg"
    Parse  Paren=1  Arg=2  token='  "som'  Parse=e_msg"
    Parse  Paren=1  Arg=2  token='  "some'  Parse=_msg"
    Parse  Paren=1  Arg=2  token='  "some_'  Parse=msg"
    Parse  Paren=1  Arg=2  token='  "some_m'  Parse=sg"
    Parse  Paren=1  Arg=2  token='  "some_ms'  Parse=g"
    Parse  Paren=1  Arg=2  token='  "some_msg'  Parse="
[] Checking sizes: 4 2 

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-06T15:21:35Z


It looks like the problem is in the trimWhitespace function in V3PreProc

 592           printf("    before:  %s\n", cp );
 593           string arg = trimWhitespace(refp->args()[numArgs], true);
 594           printf("    after:  %s\n", cp );

This outputs

     before:  , arg1, arg2)
     after: 

Why should the version of flex would affect this?

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-06T15:26:49Z


Why should the version of flex would affect this?

It shouldn't, that's was just a guess.

Obviously you still haven't shown why it's messed up - presumably it worked correctly in the earlier call to that function.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-06T15:46:08Z


As a quick aside, I tried recompiling Verilator on OSX Mavericks using macports gcc and macports flex and I no longer encountered this bug on the provided test case.

$ gcc --version
gcc (MacPorts gcc48 4.8.2_0) 4.8.2

$ flex --version
flex 2.5.37

Note that macports gcc won't compile without the macports flex, it throws a Flex header error several minutes into the compile:

V3Lexer_pregen.yy.cpp:384:23: fatal error: FlexLexer.h: No such file or directory
 #include <FlexLexer.h>

This does seem to strongly hint the problem is in flex, but I will poke into trimWhitespace a bit more.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-07T16:27:06Z


Inside of V3PreProcImp::trimWhitespace() there is a dynamic allocation of a string variable named out. During the processing of the third parameter of the second call to macro `TESTA , the memory that variable out is allocated overlaps the memory pointed to by pointer cp in V3PreProcImp::defineSubst(). This overwrites the contents of the cp variable, which is supposed to contain the macro definition. Instead, the memory pointed to by cp now contains the out string, which happens to be the third parameter provided during the macro call.

defineSubstIn  `TESTA (cond, msg, arg1, arg2)
defineArg[0] = ' q_np'
defineArg[1] = '             "some_msg"'
defineArg[2] = ' d_p'
defineArg[3] = ' RESET_VALUE'
defineValue    '\\n  $display("HELLO %s %d %d", msg, arg1, arg2 )'
THE TEXT:
(cond, msg, arg1, arg2)
    Parse  Paren=1  Arg=0  token=''  Parse=cond, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406591 ]
    Parse  Paren=1  Arg=0  token='c'  Parse=ond, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406592 ]
    Parse  Paren=1  Arg=0  token='co'  Parse=nd, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406593 ]
    Parse  Paren=1  Arg=0  token='con'  Parse=d, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406594 ]
    Parse  Paren=1  Arg=0  token='cond'  Parse=, msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406595 ]
        trimWhitespace: out addr[ 0x7fff52e6b8e9 to 0x7fff52e6b8ec ]
     Got Arg=0  argName='cond'  default=''
        trimWhitespace: out addr[ 0x7fff52e6b8b9 to 0x7fff52e6b8bd ]
    Parse  Paren=1  Arg=1  token=''  Parse= msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406596 ]
    Parse  Paren=1  Arg=1  token=' '  Parse=msg, arg1, arg2) <>  cp addr[ 0x7fe2ab406597 ]
    Parse  Paren=1  Arg=1  token=' m'  Parse=sg, arg1, arg2) <>  cp addr[ 0x7fe2ab406598 ]
    Parse  Paren=1  Arg=1  token=' ms'  Parse=g, arg1, arg2) <>  cp addr[ 0x7fe2ab406599 ]
    Parse  Paren=1  Arg=1  token=' msg'  Parse=, arg1, arg2) <>  cp addr[ 0x7fe2ab40659a ]
        trimWhitespace: out addr[ 0x7fff52e6b8e9 to 0x7fff52e6b8ec ]
     Got Arg=1  argName='msg'  default=''
        trimWhitespace: out addr[ 0x7fe2ab406590 to 0x7fe2ab4065a6 ]         <------ RANGE OVERLAPS cp addr !!!
    Parse  Paren=1  Arg=2  token=''  Parse=  "some_msg" <>  cp addr[ 0x7fe2ab40659b ]
    Parse  Paren=1  Arg=2  token=' '  Parse= "some_msg" <>  cp addr[ 0x7fe2ab40659c ]
    Parse  Paren=1  Arg=2  token='  '  Parse="some_msg" <>  cp addr[ 0x7fe2ab40659d ]
    Parse  Paren=1  Arg=2  token='  "'  Parse=some_msg" <>  cp addr[ 0x7fe2ab40659e ]
    Parse  Paren=1  Arg=2  token='  "s'  Parse=ome_msg" <>  cp addr[ 0x7fe2ab40659f ]
    Parse  Paren=1  Arg=2  token='  "so'  Parse=me_msg" <>  cp addr[ 0x7fe2ab4065a0 ]
    Parse  Paren=1  Arg=2  token='  "som'  Parse=e_msg" <>  cp addr[ 0x7fe2ab4065a1 ]
    Parse  Paren=1  Arg=2  token='  "some'  Parse=_msg" <>  cp addr[ 0x7fe2ab4065a2 ]
    Parse  Paren=1  Arg=2  token='  "some_'  Parse=msg" <>  cp addr[ 0x7fe2ab4065a3 ]
    Parse  Paren=1  Arg=2  token='  "some_m'  Parse=sg" <>  cp addr[ 0x7fe2ab4065a4 ]
    Parse  Paren=1  Arg=2  token='  "some_ms'  Parse=g" <>  cp addr[ 0x7fe2ab4065a5 ]
    Parse  Paren=1  Arg=2  token='  "some_msg'  Parse=" <>  cp addr[ 0x7fe2ab4065a6 ]


I think the problem is line 596 of V3PreProc.cpp where cp is declared:

 569   const char* cp=refp->params().c_str();

The pointer cp is never allocated any memory, instead it's simply given the reference returned by string::c_str(). The C++ docs indicate that this pointer could potentially be invalidated by other functions that modify the string. A couple posts on StackOverflow seem to indicate that its best to always make a copy of c_str() and not rely on the pointer it returns.

  char * cstr = new char [str.length()+1];
  std::strcpy (cstr, str.c_str());

Applying the following fix did indeed solve the problem:

 571   char* cp= new char[refp->params().size()+1];
 572   std::strcpy(cp, refp->params().c_str() );
 573   const char* cp_start = cp;
 574 
 575   if (*cp == '(') cp++;
 576   for (; *cp; cp++) {
  ...
 627   }   
 628   delete[] cp_start;

There is another reference to a c_str() later in the function that should probably be fixed too:

 643   // Note we go through the loop once more at the NULL end-of-string
 644   for (const char* cp=value.c_str(); (*cp) || argName!=""; cp=(*cp?cp+1:cp)) {
...

All that being said, I couldn't find any modifications to refp->params() inside of V3PreProcImp::defineSubst() that would cause it to invalidate the c_str() pointer. In addition, trying to print the pointer location returned by refp->params().c_str() suddenly made the error disappear; as did adding a memory watchpoint in LLDB.

I'm curious if the problem here is actually LLVM; I'm not familiar enough with the C++ spec to know if what is going on here is "reasonable" behavior or a compiler bug.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-07T17:01:26Z


I brought up this issue with developers on #llvm to figure out if this was a compiler bug or not; they noted that in this context cp will certainly reference freed memory.

refp->params() returns a std::string object, which is a +copy+ of the m_params object within V3Define. This means for the following statement:

569   const char* cp=refp->params().c_str(); 

refp->params() returns a std::string object that is destroyed as soon as the c_str() call completes. This means cp will always point to freed memory!

It was mentioned that the correct way to handle this is to keep params alive:

     std::string params = refp->params()
     const char* cp = params.c_str(); 

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-07T17:12:38Z


Thanks for debugging! I think a nicer fix would be to use string iterators/ functions. Might you attempt a patch like that too?

Meanwhile I will need to audit the code to make sure I do not have similar issues elsewhere.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-08T14:59:22Z


Wilson Snyder wrote:

Thanks for debugging! I think a nicer fix would be to use string iterators/ functions. Might you attempt a patch like that too?

Meanwhile I will need to audit the code to make sure I do not have similar issues elsewhere.

I've attached a minimal patch that replaces c_str() with a string iterator in defineSubst. I otherwise did not modify the logic, and split some of my changes into two lines if they exceeded the 80 character limit.

I did a quick audit of the rest of the file and also replaced three other instances of "some_object->method_call().c_str()" that were in debug statements.

(Note that diff was created inside of the src directory, so it probably needs to be applied from there?)

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-09T01:39:40Z


In git towards 3.861.

Audited all of the code, and also Verilog-Perl and SystemPerl. Only a few other spots that are potentially problems, but overfixed to never call a function then c_str() anywhere and added a lint check for the future.

Thanks much for debugging this.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Derek Lockhart
Original Date: 2014-06-09T16:57:18Z


Wilson Snyder wrote:

In git towards 3.861.

Audited all of the code, and also Verilog-Perl and SystemPerl. Only a few other spots that are potentially problems, but overfixed to never call a function then c_str() anywhere and added a lint check for the future.

Thanks much for debugging this.

No problem! We rely on Verilator pretty heavily, I'm happy to be able to contribute something useful.

@veripoolbot
Copy link
Contributor Author


Original Redmine Comment
Author Name: Wilson Snyder (@wsnyder)
Original Date: 2014-06-11T00:59:13Z


In 3.862.

@veripoolbot veripoolbot added area: parser Issue involves SystemVerilog parsing resolution: fixed Closed; fixed labels Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: parser Issue involves SystemVerilog parsing resolution: fixed Closed; fixed
Projects
None yet
Development

No branches or pull requests

1 participant