Project

General

Profile

[logo] 
 
Home
News
Activity
About/Contact
Major Tools
  Dinotrace
  Verilator
  Verilog-mode
  Verilog-Perl
Other Tools
  BugVise
  CovVise
  Force-Gate-Sim
  Gspice
  IPC::Locker
  Rsvn
  SVN::S4
  Voneline
  WFH
General Info
  Papers

Issue #1468

Add Continuous Integration

Added by Todd Strader 3 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Tests
% Done:

0%


Description

This builds Verilator and runs the entire test suite within Travis-CI: https://github.com/toddstrader/verilator-dev/tree/travis

I've just been working with the free-for-open-source service via GitHub. Here's the build that corresponds to HEAD of this branch: https://travis-ci.com/toddstrader/verilator-dev/builds/115871166

I surveyed the free CI options available on GitHub. Travis seems widely used, it is easy to set up and as far as I can tell there are no build/CPU-hours/etc. limits for open source projects. It's pretty simple to get going for your personal fork on GitHub. Based on experimentation, you seem to be able to launch three workers in parallel for this tier of service. I'm not currently doing anything with that (see below). The workers seem to have two CPUs, which I do leverage in the .yml.

The .yml is a bit hairy with the make version upgrade and cpan installation. I think it might be better to add a ci/ directory to hold helper scripts, but I don't think that's necessary at this point. I'm open to suggestions on this.

I'm also open to explore (a limited number of) other services if people have suggestions. But this one seems to work fine.

Future work would include:
  • Building for multiple OSes (use parallel workers here)
  • Getting the skipped tests to work
  • Probably other stuff

I can open issues for these items if we proceed.

Also (and I don't mean to steal any thunder here), I noticed this today: https://github.com/verilator/verilator

Does this mean that Verilator is going to be mirrored or hosted here? I was planning to actively sync from veripool.org to my repo to get CI going. But if this is going to be constantly up-to-date one way or the other, we can easily set it up there as well. The biggest advantage with that is that incoming PRs will kick off a CI build. So whenever people get around to looking at PRs, hopefully the tests are already done.

On a separate note, I got to thinking about this because of the recent Ariane breakage. I think it would be helpful to build well maintained public Verilog projects head-to-head with Verilator. That's probably outside the scope of Verilator itself, but would give us more thorough coverage if it can be set up. I'm not quite sure how that should work, but I'll post something on the forums when I figure something out so people can tune in if they are interested.

History

#1 Updated by Todd Strader 3 months ago

Aaaand, I just noticed: https://github.com/verilator/verilator_ext_tests

That's pretty timely. I had originally thought it would be easier to piggyback on external projects' existing CI testing (e.g. add a head-to-head build in addition to using a fixed Verilator version). But doing it this way would certainly require less (read: no) buy-in from others.

Once we settle on what to do with the main repo I can kick things around with CI on verilator_ext_tests.

So, adding to the future work list:
  • External tests
  • Multiple compilers (apropos of nothing)

#2 Updated by Wilson Snyder 3 months ago

  • Category set to Tests
  • Status changed from New to Feature

https://github.com/verilator/verilator can be now used instead of git.veripool.org/git/verilator, they are identical. Eventually github might become the documented one.

BTW I setup ext_tests as having verilator separate as that was better for my workflow, if this is painful for travis, verilator could be an submodule underneath it.

As to the patch:

I'm still using 3.81 on one system, so think it's unintentional if later than 3.81 is needed, what broke?

+script: ./configure && make -j 2 && make test && cd test_regress && ./driver.pl --j 0

I think this should use the Ubuntu maintainer flag and extended tests, configure with --enable-maintainer-mode --enable-long-tests.

Any reason not to use just "make -j 3", usually one more than number of CPUs works better, as IO bound.

As you noted we can have more systems do testing. Perhaps add to makefile a test_regress_dist/_vlt/_vltmt that just pass the flags "--dist", "--vlt", and "--vltmt" to the driver respectively as these are very crude thirds.

Please add ccache.

#3 Updated by Todd Strader 3 months ago

https://github.com/verilator/verilator can be now used instead of git.veripool.org/git/verilator, they are identical. Eventually github might become the documented one.

Are you now/will you in the future accept pull requests on GitHub?

BTW I setup ext_tests as having verilator separate as that was better for my workflow, if this is painful for travis, verilator could be an submodule underneath it.

It too early to say. I'm still experimenting.

I'm still using 3.81 on one system, so think it's unintentional if later than 3.81 is needed, what broke?

t_flag_csplit was the only affected test. See: https://travis-ci.com/toddstrader/verilator-dev/builds/115586758 It was mad that "Slow" files were being compiled with -O2. I'm not 100% sure what was happening here, but it seems that this bit of verilated.mk was not working correctly:
%.o: %.cpp
        $(OBJCACHE) $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(OPT_FAST) -c -o $@ $<

%__Slow.o: %__Slow.cpp
        $(OBJCACHE) $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(OPT_SLOW) -c -o $@ $<

It works locally on my machine and on newer Travis images, both which have make 4.1. And when I upgraded the Trusty image's make this went away there as well. That's as far as I debugged this one.

I think this should use the Ubuntu maintainer flag and extended tests, configure with --enable-maintainer-mode --enable-long-tests.

Cool, yeah let's turn everything on.

Any reason not to use just "make -j 3", usually one more than number of CPUs works better, as IO bound.

I have an irrational fear of large -j numbers due to crashing more than one machine with "make -j". But -j 3 sounds good.

As you noted we can have more systems do testing. Perhaps add to makefile a test_regress_dist/_vlt/_vltmt that just pass the flags "--dist", "--vlt", and "--vltmt" to the driver respectively as these are very crude thirds.

I don't think we'd even need to touch the makefiles for this. Parallel environment variables can be expressed in the .yml. See: https://travis-ci.com/toddstrader/verilator-dev/builds/114805142

However, the issue here is that while you can make stages with dependencies in Travis, the disk doesn't persist between stages. At least not in the open source service. That means that if we split up the tests like this, we'll have to rebuild Verilator three times as much as necessary. That's fine-ish if we're only running one OS + compiler. But it would definitely be worse if we build additional OSes/compilers. I'd suggest using the limited parallelism for that long-term. But it's pretty simple, so I can do the crude test split for the short-term.

Please add ccache.

Relates to the last point. Will ccache do anything for us if we have to start from scratch every time? I haven't found any free-for-open-source CI services which give you persistent storage. But that would be pretty helpful if it were out there.

#4 Updated by Wilson Snyder 3 months ago

I would hope that the ccache is persistent across their farm, or at least it should, otherwise I don't see why they would support it at all (versus users just enabling it outside their travis config file).

I disabled the t_flag_csplit test when an old make is present.

#5 Updated by Stefan Wallentowitz 3 months ago

Hi,

I just wanted to point out our ongoing activities of LibreCores CI. It is a jenkins-based CI system we are continuously (but a bit slowly) working on specifically for silicon projects.

In case you see problems with Travis, we are happy to help!

Cheers, Stefan

#6 Updated by Todd Strader 3 months ago

Woah. I assumed that you couldn't share state between instances because: https://docs.travis-ci.com/user/build-stages/#data-persistence-between-stages-and-jobs

But I guess not: https://docs.travis-ci.com/user/caching/

This could change the parallelism calculus. I wonder if we can force different compilers etc. to start in parallel and then have subsequent test slices run after the build is cached. Something to explore . . .

I'm still trying to get the cache to work in Travis. I see ccache build the cache and Travis uploads it somewhere, but subsequent runs don't seem to pull it down. Still investigating this.

I also think I misunderstood your comment about Makefile changes for the test slices. I'm guessing you'd like to avoid cd'ing to test_regress and just "make test" instead. That's in there now.

What is --enable-maintainer-mode supposed to do? When I run "./configure --help" I see:
  --enable-maintainer-mode
                          ignored

And when I ./configure both with and without this option, the resulting Makefiles are the same. I wonder if I don't have something set up correctly.

--vltmt is a bit of an issue. It's currently hardcoded for three threads, but we only have two cores. I modified driver.pl to set this to the max of three and the machine's cores (i.e. two under Travis). First off, I don't know if one less thread is going to break any coverage that's currently in there. But beyond that, a number of the tests break now saying:
%Warning-UNOPTTHREADS: Thread scheduler is unable to provide requested parallelism; consider asking for fewer threads.
which is sort of weird because I decreased the thread count. I can look into this, but any suggestions you have would be appreciated. Travis log here: https://travis-ci.com/toddstrader/verilator-dev/jobs/209590886

The current state of the world is here: https://github.com/toddstrader/verilator-dev/tree/travis and here: https://travis-ci.com/toddstrader/verilator-dev/branches

#7 Updated by Todd Strader 3 months ago

I guess I just wasn't patient enough. I was looking for the cache to be shared between the test slices (I had one run after the first one completed). But the key they store the caches on must include the environment because it is different between the three test slices. However, when I run the same test slice again, it has the same key and it picks up the cache. The time for build + --dist tests went from 8:40 to 3:00.

For --vlt and --vltmt it looks like we're getting close to the maximum cache size they'll give us:
cache size                         425.7 MB
max cache size                     500.0 MB
I'm not sure how much this will grow if we combine all the tests, but this may be reason enough to keep them split up.

#8 Updated by Todd Strader 3 months ago

OK, so I'm not sure what to do with --vltml. Based on reading the man page and this issue: https://www.veripool.org/boards/2/topics/2727-Verilator-threaded-compile-error-message it seems that avoiding the UNOPTTHREADS warning can be a bit finicky. Again, what is odd is that things were happy compiling with three threads, but then Verilator told me I was asking for too many threads when I downgraded to two threads.

I'm guessing that based on the number of tests that skip the vltmt scenario (95 by my count) some pruning already happened when multithreading was initially added. Based on that, my next step here is to skip these tests under --vltmt if the detected core count is <= 2. Please let me know if you think we should be doing something else.

There are also two other tests that fail with different error messages:
    #vltmt/t_dpi_threads_collide: %Error: Exec of obj_vltmt/t_dpi_threads_collide/Vt_dpi_threads_collide ok, but expected to fail
        make -j && test_regress/t/t_dpi_threads_collide.pl  --vltmt
    #vltmt/t_gantt: %Error: wrong number of gantt lines
        make -j && test_regress/t/t_gantt.pl  --vltmt
but based on the error messages, it sounds like variations on the same theme.

#9 Updated by Todd Strader 3 months ago

As advertised, I am now skipping these failing vltmt tests if the core count is < 3.

I'm guessing that the large number of tests with
scenarios(vlt => 1);
are tests that can't run under vltmt. I just replicated that behavior. However, if I'm interpreting this correctly, doesn't that mean that these tests won't run under other simulators? It seems like we'd rather want something like "vltmt => 0", but that doesn't appear to work like I imagined.

I believe this is ready to pull now: https://github.com/toddstrader/verilator-dev/tree/travis https://travis-ci.com/toddstrader/verilator-dev/builds/116795545

On my travis-ci.com dashboard there is a button to request access for GitHub organizations I'm in. Once we land that, I'll click that and we'll see what happens.

#10 Updated by Wilson Snyder 3 months ago

UNOPTTHREADS is a bit finicky, it's based on the parallelism it can find divided by threads. Can you see if we can get some of the tests you are disabling back if we just disable this warning with low thread counts?

Tests with vlt => 1 generally are that way because while they could run with vltmt there is no point, it would only test the same logic. For example many of the frontend tests which just check for errors, likewise these don't run on other simulators as would not fail the same way. linter=>1 is used when a test is a generic lint test that should fail on other linters.

+scenarios(::no_vltmt_for_few_cores());

Generally "scenarios" has indicated where a test case should be run. If something about the system etc prevents the test from passing then it should be a skip. This helps debug as when you manually run a test it is then obvious if the test has run and passed, or for some reason hasn't really passed due to some config issue. So I'd suggest have too_few_cores not warn just return a bool, and not changing the scenarios lines, and instead approx:

if ($Self->too_few_cores) {
    $Self->skip("Skipping due to too few cores");
} else {
    ...

(Generally anythign called by a t/ file should use $Self instead of as :: as it allows a future more complicated function to use per-test information if need be.)

On my travis-ci.com dashboard there is a button to request access for GitHub organizations I'm in. Once we land that, I'll click that and we'll see what happens.

If that doesn't work or maybe after should we make a verilator organization account on travis-ci.com instead?

diff --git a/.travis.yml b/.travis.yml
+before_install:
+# Trusty has an old version of make
+# t_flag_csplit needs a newer version of make than 3.81

This test was modified in trunk to skip with old make, so installing make should no longer be needed.

+    ./configure --enable-maintainer-mode --enable-longtests &&

To answer earlier question --enable-maintainer-mode does nothing at present, but it is set by Ubuntu builds so is good to test that it does nothing, and if does something in the future we'll want it (presumably).

diff --git a/test_regress/driver.pl b/test_regress/driver.pl
@ -58,6 +58,7 @ autoflush STDERR 1;
+our $vltmt_threads = 3;

Nit, $Vltmt_Threads as in perl scripts the convention is to Capitalize global variable names.

diff --git a/test_regress/t/t_gantt.pl b/test_regress/t/t_gantt.pl
 use IO::File;
+if (!::too_few_cores()) {

I'd like to have this test especially working in travis, as is only check of a good deal of code and the verilator_gantt utility program, perhaps we always pass --threads 2 to it and modify checks appropriately?

#11 Updated by Todd Strader 3 months ago

On my travis-ci.com dashboard there is a button to request access for GitHub organizations I'm in. Once we land that, I'll click that and we'll see what happens.

If that doesn't work or maybe after should we make a verilator organization account on travis-ci.com instead?

Oh yeah, I don't want this to run under my account. I just think this button I see is the way to request that the organization (presumably authorized by you) allow Travis to connect to its GitHub account. I've never done this before, so we'll see how it goes. But the end state should be this happening in Travis as the Verilator GitHub organization.

I believe I've incorporated all of the feedback and now the only tests that are failing are t_gantt and t_inst_tree_inl0_pub1. The latter also appears to be specifically testing for three threads. I'll see what I can do about adapting these two tests. https://travis-ci.com/toddstrader/verilator-dev/jobs/211091572

#12 Updated by Wilson Snyder 3 months ago

Sorry, what I meant for t_gantt was just pass --threads 2 always then fix the test. I just pushed that change.

I think your deltas are fine, you can push your branch to verilator/verilator's master yourself when you feel ready.

As you're a verilator project maintainer hopefully that's enough for travis to let you make a verilator project account; otherwise if I need to do something instead let me know.

#13 Updated by Todd Strader 3 months ago

OK, I believe everything is push-able now: https://github.com/toddstrader/verilator-dev/tree/travis https://travis-ci.com/toddstrader/verilator-dev/builds/117165094

All the tests are green and I think I've got everything else cleaned up.

I think your deltas are fine, you can push your branch to verilator/verilator's master yourself when you feel ready.

Wow, that's weighty. I tried this, but GitHub said no:
$ git push upstream
remote: Permission to verilator/verilator.git denied to toddstrader.
fatal: unable to access 'https://github.com/verilator/verilator.git/': The requested URL returned error: 403

Also, I'd like to discuss how you see this working mechanically, e.g. should I update Changes before pushing. We can take that offline if you'd like.

As you're a verilator project maintainer hopefully that's enough for travis to let you make a verilator project account; otherwise if I need to do something instead let me know.

I think we need something else. I clicked the button but it's now waiting on confirmation from the organization owner, which I believe is you. No rush, but I expect you have an email or something.

#14 Updated by Wilson Snyder 3 months ago

Fixed a mis-setting, you should be able to push. If other problems might want to use git rather than http as a transport.

Yes, when pushing generally please update Changes. This stuff doesn't really impact users, so wouldn't update it.

Found the auth email and approved it.

#15 Updated by Todd Strader 3 months ago

OK, the push works now and I see the authorization for Travis. I'm not sure how to tell Travis to start watching the organization's repos. I'll be offline through the weekend, but will figure this out next week.

#16 Updated by Wilson Snyder 3 months ago

FYI Travis caught I botched t_flag_csplit on old makes, fixed (I think).

#17 Updated by Wilson Snyder 3 months ago

FYI got a false failure. I added automatic retry of failures (long overdue). Also turned down the verbosity.

#18 Updated by Wilson Snyder 3 months ago

1. t_leak seems to fail, can you debug?

2. Have the vltmt & vlt section only do a test_regress, not a test, as this will speed it up a bit. (We're doing the examples three times so only dist could do it.)

3. are more environments "free" in that they run in parallel and won't just make the results be later? Here's what I think would be good:

Latest Ubuntu LLVM
Latest Ubuntu GCC
Latest OSX
Older Ubuntu (14.04) GCC
we could do Ubuntu 16.04 but probably not likely to catch much the others miss

#19 Updated by Todd Strader 3 months ago

t_leak seems to fail, can you debug?

Sure.

Have the vltmt & vlt section only do a test_regress, not a test

Got it. I'm starting the next batch of changes here: https://github.com/toddstrader/verilator-dev/tree/more-travis

are more environments "free" in that they run in parallel and won't just make the results be later?

Parallelism seemed to be limited to three cores when setting different environment variables, so I would assume this will behave the same. But I'll mess around with it and see what I can find.

#20 Updated by Todd Strader 3 months ago

Yeah, we only have three workers to play with regardless of OS, compiler, etc. I've set it up so that we can get the ~30 minute builds we've been seeing usually: https://travis-ci.com/toddstrader/verilator-dev/builds/117804066 and then a much longer daily cron build for different OS/compiler variants: https://travis-ci.com/toddstrader/verilator-dev/builds/117804239

Please let me know what you think of this strategy.

YAML is here: https://github.com/toddstrader/verilator-dev/tree/more-travis

I also added a Travis badge to the README. It doesn't show up when you run perldoc, but GitHub renders it.

Still TODO:
  • Investigate t_leak
  • Get the new OS/compiler variants to work

#21 Updated by Todd Strader 3 months ago

Couple more notes:

I'm pretty sure that t_leak is not flaky but just failing under Ubuntu 14.04. It probably looked flaky because as we are running with "os: linux" Travis is free to pick whichever version it wants. It looks like the Vt_leak binary finishes with a bad return code and without printing anything. I'll have to look into testing this in a container or VM because debugging through Travis does not sound fun.

If this is true, do you want to turn off --rerun? I don't think I've ever seen a test flake out and I think it would be good to know if that is happening.

Also, you mentioned:

we could do Ubuntu 16.04 but probably not likely to catch much the others miss

However, please note that the latest Ubuntu version Travis has is currently 16.04: https://docs.travis-ci.com/user/reference/overview/

I'm letting Travis pick the Ubuntu version on pushes (presumably this will trend toward the latest version they have) and explicit version in the cron'ed jobs.

Also, that GitHub branch above is very much a moving target because I have to push so much to test with Travis. If you want to take a look at the point in time I was talking about there, please see: https://github.com/toddstrader/verilator-dev/tree/29fffc8f07bd6a96feeafa7e2c6c1b25f5cc8d54

#22 Updated by Wilson Snyder 3 months ago

Feel free to push the badged README.pod. The more_travis branch currently seems to have hackage to debug t_leak so don't push that.

I started a VM with Ubuntu 16.04 and the leak test passed. If we have to skip it on travis we can.

I like having the rerun, as now print almost nothing the first time, then verbose on the rerun, so the travis logs look reasonable.

#23 Updated by Wilson Snyder 3 months ago

FYI I disabled t_leak.pl for now when under travis.

#24 Updated by Todd Strader 3 months ago

Ok, cool. I'm pretty sure it's only happening under 14.04 in Travis (and only --vltmt). I tried to reproduce under docker and a VM but could not. I'm checking for that specific distro under Travis in my branch and skipping. So if you want, I can put that in so we get the test under 16.04.

My cron/push YAML split seems to work except I bumped into a 50 minute runtime limit. So we still need the three way test split for the crib jobs. I'm trying to find a way to do this without a copious amount of YAML.

So I believe gcc under 16.04 and 14.04 is working now. I still need to look at osx and clang.

And yeah, I really like the quiet output especially for Travis. I'm just worried that --rerun will allow currently solid tests to become flaky. But since Travis isn't the only place these tests are run maybe it doesn't really matter and preventing false positives is more worthwhile.

#25 Updated by Todd Strader 2 months ago

I don't know if it's the only clang problem, but there is a systemic problem when using clang under Travis which is caused by -faligned-new. I read the discussion in bug1231 and see that you added a canary to configure to test for this issue and remove the compiler flag. However, in this setup we're passing the test:
checking whether clang++ accepts -faligned-new... yes
but still failing when we try to compile Verilated code:
clang++    -g Vt_interface_gen8__ALLboth.o    -o Vt_interface_gen8 -lm -lstdc++ 2>&1
Vt_interface_gen8__ALLboth.o: In function `main':
Vt_interface_gen8__ALLboth.cpp:(.text+0x83): undefined reference to `operator new(unsigned long, std::align_val_t)'
But when I just pull the flag out of verilated.mk things work: https://travis-ci.com/toddstrader/verilator-dev/builds/118748018 I tried to install a different clang (6.0 was the latest I could find even though 7 is what on there by default?) but that didn't do anything different.

I'm guessing there is some environment problem here, but I also don't understand why the sanity check in configure doesn't fail. Any thoughts you have on this would be appreciated.

#26 Updated by Wilson Snyder 2 months ago

I was thinking that configure compiles but doesn't link, but it does seem to try a link, so I'm at a loss.

#27 Updated by Todd Strader 2 months ago

This is ready to push/discuss next steps: https://github.com/toddstrader/verilator-dev/tree/more-travis

Here is the corresponding build: https://travis-ci.com/toddstrader/verilator-dev/builds/119542645

And cron build: https://travis-ci.com/toddstrader/verilator-dev/builds/119542959

Some notes:

I believe t_leak.pl only fails under Ubuntu 14.04 with --vltmt. I changed the test to skip only that scenario so we can get more coverage.

t_sys_sformat_noopt.pl was skipping clang versions between 3.8 and 5.0, but I'm seeing it fail both in Travis and locally with clang versions 7 and 8. I've disabled this for clang in general.

I've run into two issues so far with osx (sudo cpan isn't working for me and config_rev.pl is failing). I haven't seriously used an Apple since my IIc+. I'm going to leave this for someone who has access to one of Mr. Jobs's computers.

The YAML is less terse than I would like. I checked with the Travis support folks and this seems to be the best way to:
  1. avoid the full cross product of configurations
  2. differentiate between push initiated and cron jobs

Currently cron tests the configurations you suggested (minus osx) and push initiated jobs will kick off a random linux distribution. The latter gives us the advantage of picking up later Ubuntu distributions for free when Travis upgrades. Once I push this change, I'll turn on daily a cron job for the main Verilator project like I have on my fork.

As is, I would plan to open a couple follow-on issues:
  • CI for osx
  • Figure out why configure doesn't detect the -faligned-new issue when running clang under Travis

#28 Updated by Wilson Snyder 2 months ago

Great, again. "more-travis" looks fine please squash merge it to trunk (or if get it to one commit just normal pull to trunk).

I have an idea who I can sign up for OSX, as I also don't use it, which was part of the motivation to get it regressing, let me ask him.

#29 Updated by Todd Strader 2 months ago

Squashed, pushed and the tests are green. Also for reference, bug1477 and bug1478 were opened as discussed above.

#30 Updated by Wilson Snyder 2 months ago

  • Status changed from Feature to Closed

Awesome work!

#31 Updated by Wilson Snyder about 1 month ago

P.S. Note that github just announced free open source CI via Actions. From what I can tell it's two active actions each with two cores, so Travis is still better throughput for now.

Also available in: Atom