A second look at SCons performance

UPDATE: In response to comments here and elsewhere, I’ve done another series of SCons builds using the tips on the SCons ‘GoFastButton’ wiki page. You can view the results here


A few months ago, I took a look at the scalability of SCons, a popular Python-based build tool. The results were disappointing, to say the least. That post stirred up a lot of comments, both here and in other forums. Several people pointed out that a comparison with other build tools would be helpful. Some suggested that SCons’ forte is really incremental builds, rather than the full builds I used for my test. I think those are valid points, so I decided to revisit this topic. This time around, I’ve got head-to-head comparisons between SCons and GNU make, the venerable old workhorse of build tools, as I use each tool to perform full, incremental, and clean builds. Read on for the gory details — and lots of graphs. Spoiler alert: SCons still looks pretty bad.

The setup

As before, my test system is a dual 2.4 GHz Intel Xeon, with 2 GB RAM. But since the previous set of tests, my system has been upgraded to RHEL4; there’s a new version of Python; and there’s even a new version of SCons — the big 2.0. So the setup is now:

  • RedHat Enterprise Linux 4 (update 8, kernel version 2.6.9-89.ELsmp)
  • Dual 2.4 GHz Intel Xeon, with hyperthreading enabled
  • 2 GB RAM
  • Python 2.7
  • SCons v2.0.0.final.0.r5023
  • GNU make 3.81

As previously, the test builds consists of a bunch of compiles and links: N C files, each with a unique associated header, spread across N/500 directories (to ensure there are no filesystem scalability effects), are compiled, then bundled into a standard archive library. Every 20th object is linked into an executable along with the archive. The build tree is generated using a Perl script, which generates both SConstruct files and Makefiles for building. One difference between the two builds: when using GMake, I added the -MMD flag to gcc, to generate additional dependency information; each timed full build was then preceded by a full build to generate all the dependencies and a clean build to nuke all the generated output. I felt that this gave a more realistic comparison to SCons, which employs elaborate dependency analysis logic to ensure accurate incremental builds.

Round 1: Full builds

Again, my interest is primarily full, from scratch builds:

As expected, SCons’ performance is pretty miserable: by the time we get to a build with several thousand source files, the build time is already over 15 minutes, and with the same n2 growth we saw last time, the times race past one hour, then two, finally hitting nearly 5 hours for a build containing just 50,000 source files.

GNU make shows a much more sedate, linear growth — in fact it neatly keeps pace with a simple shell script that does all the same compiles and links as the regular build, but without any of the overhead of a build tool. Yes, I know that a shell script is no substitute for a proper build tool. It just serves to give us an idea of how long it takes just to do the work in the build — a lower bound on the build time. No build tool running serially can run the build any faster than this (obviously we can do better, by a constant factor, if we use parallel build features).

Using that lower bound, we can compute the amount of overhead introduced by the build tool. This is all the time that the tool spends doing things other than actually running the commands needed to execute the build: parsing SConstruct files or makefiles, building the dependency graph, traversing that graph, computing command-lines, etc. Viewing this overhead as a percentage of the total build time gives us an easier-to-digest comparison between SCons and GMake:

Even with relatively few files — around 2,000 — over 50% of the total build time is wasted by overhead with SCons. In comparison, GMake has barely any overhead on a build of that size. At the other end of the range, SCons overhead accounts for nearly 90% of the total build time. GMake overhead has increased as a percentage of the total time too, but only to a modest 20% — still significantly less than SCons. In fact, with 50,000 files, GMake overhead is less than half of SCons overhead with just 2,000 files!

OK, just a couple more things to touch on here before we look at incremental build times: first, you probably noticed the sudden hook in the graph for SCons full build times, between 45,000 and 50,000 files. That means that at that level, on my system, some other factor has kicked in to influence the times. I believe this is because the system has started to page heavily, as SCons’ memory footprint has grown to consume all available RAM. Second, if you compare the SCons times in this article with those in the previous article, you’ll see that this time around, the SCons times are a bit better than last time — about 30% on average. Keep in mind that there are several differences between this setup and the previous: a new OS, a new version of Python, a new version of SCons. Even a new version of the compiler and other tools used during the build. I did try to pin down exactly which factors contributed to this improvement by testing various combinations of versions of Python and SCons (for example, Python 2.6.2 with SCons 2.0; or Python 2.7 with SCons 1.2.0); none of these tests produced any significant change in performance, so I presume that the performance difference is most likely due to the new operating system and tools. I choose not to pursue the matter further than that.

Round 2: Incremental builds

A lot of people claimed that full builds “don’t really matter; developers only really do incremental builds”. So let’s take a look at incremental build performance:

For these builds, I ran a full build, then patched one C file and its associated header. This caused a rebuild of one object file, followed by a rebuild of the archive and the executable that the object feeds into. The actual time to execute just those commands is a paltry one-tenth of a second, so with either tool, the incremental time is dominated by the overhead added by the tool itself. Even so, it’s obvious that SCons adds considerably more overhead than GMake. Even with a small build containing just 2,000 files, SCons burns about 35 seconds to do one-tenth of a second of work. GMake does the same build in about 3 seconds. For the 50,000 file build, SCons ran for about 25 minutes, again just to do one-tenth of a second of actual work; GMake ran for about 9 minutes.

One thing I find especially interesting about this graph is that supposedly the problem with full builds is that they force SCons to constantly rescan the dependency graph. That shouldn’t be necessary in an incremental build, and yet we still see what looks like an O(n2) growth in build time.

Round 3: Clean builds

The last build comparison was a clean build: scons -c versus gmake clean:

At last we have found a build variant where SCons performance is in the same ballpark as GMake! Of course, according to the SCons docs, you’ll never need to do a clean build with SCons (because with SCons your dependencies are perfectly accurate), so maybe this data point is not actually interesting to SCons users.

Sudden death overtime: Memory usage

One last comparison: memory usage. This metric is of particular interest because it puts a hard limit on the maximum size of the build that the tool can handle. I’ve learned the importance of memory efficiency the hard way — long nights in “firefighting” mode to improve memory usage to accomodate this or that customer’s enormous build. I’d hoped that the new versions of Python and SCons used in this test would prove beneficial for SCons memory footprint. Unfortunately, the opposite is true: memory usage is about 8% worse now than it was the last time I ran these tests, and as you can see here, SCons uses about 4 1/2 times as much memory as GMake:

With 50,0000 files, SCons uses just a hair less than 2GB of memory, enough to cause my test system (with 2GB of RAM) to start swapping. In comparison, GMake needs just 440MB of memory for 50,000 files. At that rate, GMake won’t start to thrash my system until the build has grown to more than 225,000 files. Unfortunately, it seems likely that there’s not much that the SCons developers can do to fix this: because SCons is implemented in Python, they are at the mercy of the Python runtime implementation. There’s just not enough control over low-level details like memory allocation when you’re using an interpreted language.

Conclusions

We can clearly see now that it is not simply “the nature of the beast” that SCons performance is so bad. GMake scales much more gracefully, both in terms of elapsed time, and in memory usage, on a variety of scenarios: full, one-touch incremental, and clean builds. From the discussions that the previous post sparked, we know that the primary problem is an inefficient O(n2) algorithm in the SCons implementation. It seems that SCons is academically interesting, but ultimately not practical for any non-trivial build (as some of my customers are now finding out the hard way).

A lot of SCons fans, in defense of SCons, say things like “Well, the SCons developers made a concious decision to focus on correctness rather than performance.” But what good is a correct system that is so slow it’s unusable? For me, the choice is a no-brainer: my time is too precious to waste on a slow build tool.

 

29 Comments

  1. Diwaker says:

    Eric, thanks for the analysis. Can you share your test harness? I’m curious to see how waf would perform on your benchmark.

    • Eric Melski says:

      @Diwaker: I’m working on cleaning up the build generator, it’s not quite in a state that I’m comfortable sharing yet. Should be ready soon.

    • Eric Melski says:

      @Diwaker: I finally got the test harness up here. I invoked it as “./genscons.pl -d N -l 2 -f 500″, where d varied from 3 to 99; the script generates a tree with N subdirectories, and (N + 1) * 500 total C files.

  2. Josh says:

    I think most people use SCons for its correctness guarantee and not its speed.

    I would like to see how SCons performs on incremental builds when you use the –interactive option. I would also be interested to see how changing the SCons files change identification method from md5 to timestamp affects performance.

    • Eric Melski says:

      @Josh: on correctness-vs-performance, are you willing to pay an extra 4 hours on a full build for a small additional guarantee of correctness? As for –interactive, I tried it out, but I honestly don’t see how you expect that to help here. There are thousands of targets in the build, surely you don’t think I’m going to type “build f00001.o” for every one of those?

  3. Saphyx says:

    How would SCons compare to CMake, in a similar test harness? (e.g. KDE chose SCons by committee, then CMake by action.) I guess CMake would be somewhere in between SCons and CMake, but it isn’t obvious (to me at least) how it would position itself.

    I might even be interested in chipping in.

    • Eric Melski says:

      @Saphyx: I haven’t used CMake, so it would be tough for me to modify the test harness to generate cmake files. Once I publish the generator, maybe you can see about adding CMake support.

  4. Sohail says:

    +1 Josh

  5. Gary says:

    I’d also be interested to know how this test is affected by the techniques in http://www.scons.org/wiki/GoFastButton, all of which can give significant speedups compared to the default setup.

  6. Anon says:

    But which build system to go for? Makefiles are tortuous to debug when they go wrong. Surely there has to be some better way (without switching to another languge)? waf? Jam? CMake? rake? ant?

  7. Kai Backman says:

    I’d be interesting in seeing data of Jam in this context. It tends to beat out almost all other systems in terms of raw speed.

    • Eric Melski says:

      @Kai: I haven’t used Jam, but like I told Saphyx, once I publish the generator script maybe you’d be willing to extend it to support Jam.

  8. Anon says:

    Eric:
    Thanks for the reply (PS: only by reading your other blog entries did I realise you actually sell and improved make!)

  9. Matt Doar says:

    Yes, SCons is slower than gmake. I’d still choose SCons for medium sized projects because it is more likely to remain correct over time, IMO.

  10. Electric says:

    Hopefully SCons will be improve when it comes to speed performance but Correctness is what important but It’s understandable to have the accurate result it will take time which result to slow speed performance.

  11. Nikita Manovich says:

    Could you please update your article? I have added CMake support into your script. Please let me know in case of any problems with the modified script (for me it works).

  12. Nikita Manovich says:

    — genbuild.pl.orig 2010-12-27 23:56:42.128384318 +0300
    +++ genbuild.pl 2010-12-28 00:00:55.792748177 +0300
    @@ -124,7 +124,7 @@
    print CFILE $buff;

    if ($group == 1) {
    - print CFILE “main (int argc, char * argv[]) {n”
    + print CFILE “int main (int argc, char * argv[]) {n”
    . “tint i, mb_out;n”
    . “tprintf (“I am $basedir/%sn”, “$filname[$idx]“”
    . “);n”
    @@ -138,7 +138,7 @@
    elsif ( ($fil[$idx] % $group) == 0) {
    $idx2 = $fil[$idx] + 1;
    print CFILE “extern int printr_$file[$idx2] (char * fname);n”
    - . “main (int argc, char * argv[]) {n”
    + . “int main (int argc, char * argv[]) {n”
    . “tint i, mb_out;n”;

    print CFILE “tprintr_$file[$idx2] (“$filname[$idx]“);n”
    @@ -151,7 +151,7 @@
    . “ttprintf (“%07d 9a123456789b123456789c12345″
    . “6789d123456789e123456789f12345678n”, i);n”
    . “t}n”
    - . “texit (0);n}n”;
    + . “treturn (0);n}n”;
    }
    else {
    $idx2 = $fil[$idx] + 1;
    @@ -199,15 +199,19 @@
    $cdstr = “.”;
    for ($cdidx = 1; $cdidx $thiscm[$idx]“)
    + || die “Cannot open $thiscm[$idx] for output: $!”;
    open (SC, “> $thissc[$idx]“)
    || die “Cannot open $thissc[$idx] for output: $!”;
    open (MK, “> $thismk[$idx]“)
    || die “Cannot open $thismk[$idx] for output: $!”;

    + print CM “cmake_minimum_required(VERSION 2.8)n”;
    print SC “import osn”
    . “env = Environment(ENV = os.environ)nn”;

    @@ -231,6 +235,7 @@
    }

    for (my $lupsIdx = 0; $lupsIdx < $nlups; $lupsIdx++) {
    + printf CM ("include_directories(./lup%03d$fileSuffix)n", $lupsIdx);
    printf SC ("env.Append (CPPPATH = ['./lup%03d$fileSuffix'])n",
    $lupsIdx);
    if ($lupsIdx == 0) {
    @@ -241,6 +246,7 @@
    printf MK ("${foo}%%.o: CPPFLAGS $eq -I${foo}lup%03d$fileSuffixn",
    $lupsIdx);
    }
    + print CM "link_directories(${CMAKE_CURRENT_SOURCE_DIR}/$basedir)nn";
    print SC "env.Append (LIBPATH = ['.'])nn";
    print MK "${foo}%: LDFLAGS = -L$libdirnn";

    @@ -265,6 +271,7 @@
    print MK "CC=gccnn";
    print MK "all:nt@ps -eo vsz,rss,comm | fgrep makenn";
    }
    + print CM "n";
    print SC "n";
    print MK "n";

    @@ -275,6 +282,8 @@
    $totald++;
    }

    + $cmfcc = "";
    + $cmfar = "";
    $scfcc = "";
    $scfar = "";
    $mkfcc = "";
    @@ -297,7 +306,9 @@
    # Even when there are no groups, pre-compiled headers
    # still apply.
    #
    - print SC "env.Program ('$filname[$idx].c')nn";
    + print CM "add_executable($filname[$idx]_exe $filname[$idx].c)nn";
    + print SC "env.Program ('$filname[$idx]_exe',"
    + . " '$filname[$idx].c')nn";
    } # end of $group == 1
    #
    # Compile source files in groups.
    @@ -306,7 +317,10 @@
    else {
    if ( ($fil[$idx] % $group) == 0) {
    if ("$scfcc" ne "") {
    + print CM "$cmfccn$cmfarnn";
    print SC "$scfccn$scfarnn";
    + $cmfcc = "";
    + $cmfar = "";
    $scfcc = "";
    $scfar = "";
    }
    @@ -319,14 +333,16 @@
    $groupFilename = "$filname[$idx]";
    $nextnum = substr ($filname[$idx], 1, 5);

    - $scfcc = "env.Program('$filname[$idx]',n"
    - . "tLIBS=['$filname[$idx]'])n";
    + $cmfcc = "add_executable($filname[$idx]_exe $filname[$idx].c)n"
    + . "target_link_libraries($filname[$idx]_exe $filname[$idx]_lib)n";
    + $scfcc = "env.Program('$filname[$idx]_exe', '$filname[$idx].c',n"
    + . "tLIBS=['$filname[$idx]_lib'])n";

    - $scfar = "env.Library ('$filname[$idx]',n"
    - . "t['$filname[$idx].c'";
    + $cmfar = "add_library($filname[$idx]_lib STATIC ";
    + $scfar = "env.Library ('$filname[$idx]_lib', [";

    - $mkfcc = "TARGETS += ${foo}$filname[$idx]n${foo}$filname[$idx]: ${foo}$filname[$idx].o ${foo}lib$filname[$idx].an";
    - $mkfar = "${foo}lib$filname[$idx].a: ${foo}$filname[$idx].o";
    + $mkfcc = "TARGETS += ${foo}$filname[$idx]_exen${foo}$filname[$idx]_exe: ${foo}$filname[$idx].o ${foo}lib$filname[$idx]_lib.an";
    + $mkfar = "${foo}lib$filname[$idx]_lib.a: ";

    print MK "SRCS += ${foo}$filname[$idx].cn";
    $tmpfnam = $filname[$idx];
    @@ -334,10 +350,12 @@
    $tmpfnum = sprintf ("%05d", $nextnum + $filei);
    substr ($tmpfnam, 1, 5) = $tmpfnum;

    - $scfar .= ",nt '$tmpfnam.c'";
    + $cmfar .= "nt $tmpfnam.c";
    + $scfar .= ($filei != 1? "," : "") . "nt '$tmpfnam.c'";
    $mkfar .= " ${foo}$tmpfnam.o";
    print MK "SRCS += ${foo}$tmpfnam.cn";
    }
    + $cmfar .= ")nn";
    $scfar .= "])nn";
    $mkfar .= "n";
    }
    @@ -360,10 +378,12 @@
    #
    # create makefile commands for the leftover files
    #
    + print CM "$cmfccn$cmfarnn";
    print SC "$scfccn$scfarnn";
    print MK "$mkfccn$mkfarnn";
    }

    + close (CM);
    close (SC);
    close (MK);

    @@ -413,6 +433,8 @@
    }

    if ($thisLvl > $thiscm[$idx]“)
    + || die “Cannot open $thiscm[$idx] for append: $!”;
    open (SC, “>> $thissc[$idx]“)
    || die “Cannot open $thissc[$idx] for append: $!”;
    open (MK, “>> $thismk[$idx]“)
    @@ -423,28 +445,34 @@
    if (index ($dirs1[$idx], ” “) > 0) {
    @subdirs = split (/ /, $dirs1[$idx]);
    for ($i = 0; $i 0) {
    @subdirs = split (/ /, $dirs2[$idx]);
    for ($i = 0; $i < $#subdirs; $i++) {
    + print CM "add_subdirectory($subdirs[$i])n";
    print SC "'$subdirs[$i]/SConstruct',nt";
    print MK "include $subdirs[$i]/Makefilen";
    }
    + print CM "add_subdirectory($subdirs[$#subdirs])n";
    print SC "'$subdirs[$#subdirs]/SConstruct'";
    print MK "include $subdirs[$#subdirs]/Makefilen";
    }
    else {
    + print CM "add_subdirectory($dirs2[$idx])n";
    print SC "'$dirs2[$idx]/SConstruct'";
    print MK "include $dirs2[$idx]/Makefilen";
    }
    print SC "])nn";
    + print CM "n";
    print SC "n";
    print MK "NUL=n";
    print MK "SPACE=$(NUL) $(NUL)n";
    @@ -456,6 +484,7 @@
    print MK "trm -f *.o ; rm -f *.ann";
    print MK "-include $(SRCS:.c=.d)";

    + close (CM);
    close (SC);
    close (MK);
    }

  13. Anatol says:

    Hi,

    could you please also test your build with Tup tool? Tup tool is designed to fast incremental builds. I mean REALLY fast incremental build. In fact the time of build equals to O(changes) not O(projectsize)

    Check out this comparison http://gittup.org/tup/make_vs_tup.html

  14. James Hartlow says:

    You are comparing a tool that computes md5 sums of the source files to check for real changes with a tool that only reads file timestamps : what were you expecting ?

    The results could have been interesting but you (the author) seems biased from the top on. As you wrote it, it seems you have never used it for real production work. On the other hand, you seem to have a great mastering of make : pretty unfair.

    No doubt SCons will be slower ! And perhaps some other tool will achieve the same results in a safer/faster/smarter way. But comparing make and scons is rather pointless : if anything, they should be compared in terms of functionalites/correctness first.

  15. Would be good to know if the comparison to make was done with scons Decider function = timestamp-newer

    http://www.scons.org/doc/2.0.0.final.0/HTML/scons-man.html
    Yeah it wasn’t clear

    • Eric Melski says:

      I used the default SCons settings, which I assume means it was doing md5 checksums during the up-to-date check. However, the test files were all trivial — less than a couple hundred bytes each. I don’t believe the md5 cost was a significant part of the problem, although I suppose it is possible.

  16. Chris BeHanna says:

    Eric, whenever I’ve seen scons performance go pear-shaped, it’s because someone did something like

    srcs = [big list o' files]
    targets = [big list o' files]

    and then naively did

    env.SomeBuilder(targets, sources)

    SCons makes every target in a builder invocation depend upon every source, so rather than the dependency graph gaining N edges, it gains N^2 edges (assuming, for example, the sources are C files and the targets are their corresponding .o files).

    The shortcuts of either inserting a proxy dependency (one of the tricks from the “GoFastButton” page) or using iteration or a list comprehension to invoke the builder once per source file/target file pair both result in DRAMATIC improvements in performance.

    I have not yet looked at your benchmark, but I can say, having converted a code base of about 2M lines of code, that scons is actually faster than GNU make in our build, when building (or incrementally rebuilding) the whole tree, and in particular when using a cache for rebuilds (10x speedup there).

    Furthermore, one cannot understate the improvement that comes from using an MD5 hash to decide when to rebuild: the ripple effects of making a change are MUCH better contained, as you no longer necessarily have to rebuild EVERYTHING that’s downstream of your change.

    I hope to have time to look at your benchmark in detail; thus far, all of the benchmarks that I’ve seen have used toy sources that are pathological for scons: sure, the overhead can be huge when the compile time for each source file is on the order of a fraction of a second. On a real code base, however, the compile times aren’t necessarily small (template-heavy C++ code built with -O2, for example, can take awhile), and the overall speedup you get from limiting downstream ripple effects from a change, coupled with the ability to pull stuff from the cache, ends up being a HUGE win. IOW, the overhead from SCons on nontrivial code bases, although still observable, fades to something less than ten percent, IME.

    Thanks for the article.

    • Eric Melski says:

      @Chris: that’s not how my test was structured, but please do take a look for yourself. I did originally have a structure like that, and that gave even more horrifically bad performance, as you can imagine — something closer to N4, naturally.

      Regarding the use of ‘toy’ sources — obviously that’s what I had to do for these tests. There aren’t many real projects of the magnitude I was testing (tens of thousands of files); at least, there aren’t many that are publicly accessible. Even if such a project existed, it’s not worth my time to convert the build system of a large real project from one tool to another simply for the sake of benchmarks like this.

      Finally, when you’re talking about a substantial code base, 10% is still a significant amount of actual time. On a two-hour long build, that’s almost 15 minutes. Not huge, but not chump change either. Given my druthers, I’ll take the faster build system and get out of the office 15 minutes before you every day :) .

  17. Jakub Narębski says:

    Could you please redo plots in loglog scale (logarithmic scale on both axes), and do linear fit to find k in O(n ** k) behavior (of least squares fit to O(n ** k) behavior)?

1 Trackbacks

Leave a comment