<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Electric Cloud Blog &#187; gmake</title>
	<atom:link href="http://www.electric-cloud.com/blog/tag/gmake/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.electric-cloud.com/blog</link>
	<description>This is your source for private development cloud best practices and technical tips and tricks for Electric Cloud solutions</description>
	<lastBuildDate>Thu, 02 Feb 2012 22:32:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>The last word on SCons performance</title>
		<link>http://www.electric-cloud.com/blog/2010/08/11/the-last-word-on-scons-performance/</link>
		<comments>http://www.electric-cloud.com/blog/2010/08/11/the-last-word-on-scons-performance/#comments</comments>
		<pubDate>Wed, 11 Aug 2010 18:33:44 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[incremental build]]></category>
		<category><![CDATA[parallel builds]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=690</guid>
		<description><![CDATA[My previous look at SCons performance compared SCons and gmake on a variety of build scenarios &#8212; full, incremental, and clean. A few people suggested that I try the tips given on the SCons &#8216;GoFastButton&#8217; wiki page, which are said to significantly improve SCons performance (at the cost of some accuracy, of course). Naturally, I [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://blog.electric-cloud.com/2010/07/21/a-second-look-at-scons-performance/">previous look</a> at SCons performance compared SCons and gmake on a variety of build scenarios &mdash; full, incremental, and clean.  A few people suggested that I try the tips given on <a href="http://www.scons.org/wiki/GoFastButton">the SCons &#8216;GoFastButton&#8217; wiki page</a>, which are said to significantly improve SCons performance (at the cost of some accuracy, of course).  Naturally, I felt that I had to do one last follow-up exploring this avenue.  And since that meant I would already be running a bunch of builds, I figured I&#8217;d try out SCons&#8217; parallel build features too.  My findings follow.<br />
<span id="more-690"></span></p>
<p><h3>Can SCons &#8220;GoFast&#8221;?</h3>
<p>You can read all about the setup in the previous post, so I&#8217;ll just jump straight to the results.  According to the &#8220;GoFastButton&#8221; recommendations, <span style="background:#eeeeee;"><font face="Courier New">&#8211;max-drift=1 &#8211;implicit-deps-unchanged</font></span> will &#8220;run your build as fast as possible&#8221;, so that&#8217;s what I used.  In all cases, I did an initial, untimed from-scratch build first, to generate the initial dependency graph, then I ran a second timed build.  After that timed run, I ran a clean build and finally ran the full build again, this time with <span style="background:#eeeeee;"><font face="Courier New">-j 2</font></span>, to evaluate the impact of SCons parallel build features.
</p>
<p>
Contrary to my expectations, the GoFast settings had relatively little impact over much of the test range &mdash; only about 5-10% faster than without those flags.  Only the very largest build showed any significant impact, with a 25% improvement.  Unfortunately that impressive result is more likely because SCons uses less memory with GoFast settings enabled.  If you recall from the <a href="http://blog.electric-cloud.com/2010/07/21/a-second-look-at-scons-performance/">previous tests</a>, with 50,000 source files, SCons&#8217; memory footprint was a hefty 2,023 MB &mdash; enough to force my test machine to start swapping.  With the GoFast settings, SCons used &#8220;only&#8221; 1,838 MB &mdash; still an awful lot of memory, but just smaller enough to avoid thrashing the system, with the end result being a substantially improved build time.
</p>
<p>
Building in parallel had a more substantial impact &mdash; reducing build times about 30% on the largest build (compared to a serial SCons build with GoFast settings enabled).  That&#8217;s not as good as I had hoped for (on a large, relatively &#8220;flat&#8221; build such as this, I expected the build to parallelize very well), but it&#8217;s not terrible.  Here are the complete results:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_full1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_full1.png?w=300" alt="SCons full build performance, click for full size" title="scons_full" width="300" height="180" class="aligncenter size-medium wp-image-692" /></a>
</p>
<p>
So, GoFast seems to be a bust on full builds.  It&#8217;s definitely better than vanilla SCons, but still nowhere near as fast as gmake.
</p>
<p>
Things look a little better on &#8220;one-touch&#8221; incremental builds though, where GoFast settings cut build times by about 30% across the board:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_incr1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_incr1.png?w=300" alt="SCons incremental build performance, click for full size" title="scons_incr" width="300" height="180" class="aligncenter size-medium wp-image-693" /></a>
</p>
<p>
Ironically, the most impressive results are on clean builds (<span style="background:#eeeeee;"><font face="Courier New">scons -c</font></span>).  GoFast settings cut build times by about 40% at the low end of the test range, and by more than 50% at the high end of the test range:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_clean1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/08/scons_clean1.png?w=300" alt="SCons clean build performance, click for full size" title="scons_clean" width="300" height="180" class="aligncenter size-medium wp-image-694" /></a>
</p>
<p>
To my amazement, SCons with GoFast settings actually beats gmake on clean builds.  My guess is that this is probably because SCons handles file deletion in-process, while gmake must invoke a separate process (<span style="background:#eeeeee;"><font face="Courier New">rm</font></span>).
</p>
<p><h3>That&#8217;s all folks!</h3>
</p>
<p>
That&#8217;s it for my analysis of SCons performance.  Thanks to everybody who offered ideas for improving my benchmarks!  I hope you found this series of posts interesting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2010/08/11/the-last-word-on-scons-performance/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>What&#039;s new in GNU make 3.82</title>
		<link>http://www.electric-cloud.com/blog/2010/08/03/gnu-make-3-82-is-out/</link>
		<comments>http://www.electric-cloud.com/blog/2010/08/03/gnu-make-3-82-is-out/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 13:29:54 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Build-Test-Deploy Best Practices]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[gnu make]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=685</guid>
		<description><![CDATA[GNU make 3.82 hit the streets last week, the first new release of the workhouse build tool in over four years. Why so long between releases? To me the answer is obvious: the tool Just Works &#8482;, so there&#8217;s no need to churn out new releases chasing the latest development fad. But as this release [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.gnu.org/software/make/">GNU make 3.82</a> hit the streets last week, the first new release of the workhouse build tool in over four years.  Why so long between releases?  To me the answer is obvious:  the tool Just Works &#8482;, so there&#8217;s no need to churn out new releases chasing the latest development fad.  But as this release shows, there is still room to innovate, without compromising on the points that make the tool so great.  The two improvements I find most interesting are <font face="Courier New">.ONESHELL</font>, and changes to pattern-search behavior:<br />
<span id="more-685"></span></p>
<p><h3>.ONESHELL</h3>
</p>
<p>
Normally, gmake  executes each line of a rule body or <i>recipe</i> using a separate invocation of the shell.  With gmake 3.82, you can add the special target <font face="Courier New">.ONESHELL</font> to the makefile, which will tell gmake to run the entire recipe using a single shell invocation.  For example, if your makefile contains the following:
</p>
<p><pre>
<div style="background:#ffffce;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;padding:10px;"><font face="Courier New">all:
	@export FOO=1
	@echo FOO is -$$FOO-
</font></div>
</pre>
<p>
Without <font face="Courier New">.ONESHELL</font>, gmake will invoke the shell twice, as follows:
</p>
<p><pre>
<div style="background:#ffffce;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;padding:10px;"><font face="Courier New">sh -c 'export FOO=1'
sh -c 'echo FOO is -$FOO-'</font></div>
</pre>
<p>
Naturally, that will not produce the output you actually want (ie, &#8220;FOO is -1-&#8221;).  With <font face="Courier New">.ONESHELL</font>, gmake instead invokes the shell just once:
</p>
<p><pre>
<div style="background:#ffffce;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;padding:10px;"><font face="Courier New">sh -c 'export FOO=1
    echo FOO is -$FOO-'</font></div>
</pre>
<p>
In addition to making it easier to write multi-line rule bodies, this feature makes it much more palatable to use alternative shells.  For example, you could imagine using Perl as the shell:
</p>
<p><pre>
<div style="background:#ffffce;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;padding:10px;"><font face="Courier New">SHELL=perl
.SHELLFLAGS=-e
.ONESHELL:

all:
	@my $$foo = "1";
	print "FOO is -$$foo-\n";
</font></div>
</pre>
<p>
(Note the use of another 3.82 feature, <font face="Courier New">.SHELLFLAGS</font>, which allows us to control the command-line flags used with the shell; in this case I&#8217;ve set those to &#8220;-e&#8221;, the Perl flag for executing a script from the command-line).
</p>
<p><h3>Pattern search changes</h3>
</p>
<p>
Prior to version 3.82, when gmake finds multiple matches during a pattern search, it prefers patterns declared earlier in the makefile over patterns declared later.  As of 3.82, gmake instead prefers the pattern that results in the shortest stem.  That sounds a bit confusing thanks to the jargon, but I think this will actually cause gmake to better adhere to the <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment">principle of least astonishment</a>.  Here&#8217;s an example:
</p>
<p><pre>
<div style="background:#ffffce;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;padding:10px;"><font face="Courier New">all: sub/foo.x

%.x:
	@echo "Prefer first match (stem is $*)."

sub/%.x:
	@echo "Prefer most specific match (stem is $*)."
</font></div>
</pre>
<p>
Compare the output from gmake 3.81 and 3.82:
</p>
<p><div style="background:#eeeeee;border:solid thin;width:60ex;margin-left:auto;margin-right:auto;">
<div style="border-bottom-width:1px;border-bottom-style:solid;"><b>gmake 3.81</b></div>
<div style="background:#ffffce;padding:10px;">
<pre><font face="Courier New">Prefer first match (stem is sub/foo).</font></pre>
</div>
</div>
<p><div style="background:#eeeeee;border:solid thin;margin-left:auto;margin-right:auto;width:60ex;">
<div style="border-bottom-width:1px;border-bottom-style:solid;"><b>gmake 3.82</b></div>
<div style="background:#ffffce;padding:10px;">
<pre><font face="Courier New">Prefer most specific match (stem is foo).</font></pre>
</div>
</div>
<p>
gmake 3.82 prefers the second pattern because it is a more specific match than the first.  Note that this is a significant backwards-incompatibility compared with previous versions of gmake!
</p>
<p><h3>Other changes</h3>
</p>
<p>
Besides those big changes, there are several smaller features, such as:
</p>
<dl>
<dt><b>.RECIPEPREFIX</b></dt>
<dd>This special variable allows you to change the character used to mark the beginning of a command in a recipe from the default TAB character.  I can only assume that the developers added this feature to quell the semi-regular complaints about make&#8217;s sensitivity to whitespace.</dd>
<dt><b>private</b> variable modifier</dt>
<dd>Normally, gmake propagates target-specific variable assignments to the prereqs of the target.  With the <font face="Courier New">private</font> modifier, you can restrict the scope of a target-specific variable assignment, so that it is not inherited by the prereqs.</dd>
<dt><b>undefine</b> directive</dt>
<dd>The inverse of the familiar <font face="Courier New">define</font> directive, <font face="Courier New">undefine</font> lets you completely remove a variable definition.</dd>
<dt><b>define</b> improvements</dt>
<dd>The <font face="Courier New">define</font> directive now supports the same assignment operators that regular variable assignment alows:  <font face="Courier New">:=</font>, <font face="Courier New">?=</font> and <font face="Courier New">+=</font>, for simple, conditional and appending assignments.</dd>
</dl>
<p>
If you want to see the full list, you can find it in the <a href="http://cvs.savannah.gnu.org/viewvc/make/NEWS?revision=2.109&amp;root=make&amp;view=markup">NEWS</a> file in the gmake source tree.
</p>
<p><h3>&#8220;Rumors of my death have been greatly exaggerated&#8230;&#8221;</h3>
</p>
<p>
Some naysayers claim make is outdated, but it&#8217;s clear that make is still alive and kicking (and if a new gmake release isn&#8217;t proof enough, take a look at some of the innovations we&#8217;ve put into <a href="http://www.electric-cloud.com/products/electricaccelerator.php">Electric Make</a>).  A hearty congratulations to everybody who contributed, and especially to Paul Smith for driving the development and release effort.  Keep up the good work!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2010/08/03/gnu-make-3-82-is-out/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A second look at SCons performance</title>
		<link>http://www.electric-cloud.com/blog/2010/07/21/a-second-look-at-scons-performance/</link>
		<comments>http://www.electric-cloud.com/blog/2010/07/21/a-second-look-at-scons-performance/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 21:23:25 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Build-Test-Deploy Best Practices]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[incremental build]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scons]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=655</guid>
		<description><![CDATA[UPDATE: In response to comments here and elsewhere, I&#8217;ve done another series of SCons builds using the tips on the SCons &#8216;GoFastButton&#8217; wiki page. You can view the results here A few months ago, I took a look at the scalability of SCons, a popular Python-based build tool. The results were disappointing, to say the [...]]]></description>
			<content:encoded><![CDATA[<p><strong>UPDATE:</strong> In response to comments here and elsewhere, I&#8217;ve done another series of SCons builds using the tips on <a href="http://www.scons.org/wiki/GoFastButton">the SCons &#8216;GoFastButton&#8217; wiki page</a>.  You can view the results <a href="http://blog.electric-cloud.com/2010/08/11/the-last-word-on-scons-performance/">here</a></p>
<hr />
<p>
A few months ago, I took a look at the scalability of <a href="http://www.scons.org/">SCons</a>, a popular Python-based build tool.  <a href="http://blog.electric-cloud.com/2010/03/08/how-scalable-is-scons/">The results were disappointing</a>, to say the least.  That post stirred up a lot of comments, both <a href="http://blog.electric-cloud.com/2010/03/08/how-scalable-is-scons/#comments">here</a> and in <a href="http://www.reddit.com/r/programming/comments/barcc/how_scalable_is_scons/">other</a> <a href="http://scons.tigris.org/ds/viewMessage.do?dsForumId=1268&amp;dsMessageId=2456717">forums</a>.  Several people pointed out that a comparison with other build tools would be helpful.  Some suggested that SCons&#8217; forte is really <i>incremental</i> builds, rather than the full builds I used for my test.  I think those are valid points, so I decided to revisit this topic.  This time around, I&#8217;ve got head-to-head comparisons between SCons and GNU make, the venerable old workhorse of build tools, as I use each tool to perform full, incremental, and clean builds.  Read on for the gory details &#8212; and lots of graphs.  <b>Spoiler alert:</b> SCons still looks pretty bad.<br />
<span id="more-713"></span>
</p>
<p><h3>The setup</h3>
</p>
<p>
As before, my test system is a dual 2.4 GHz Intel Xeon, with 2 GB RAM.  But since the previous set of tests, my system has been upgraded to RHEL4; there&#8217;s a new version of Python; and there&#8217;s even a new version of SCons &mdash; the big 2.0.  So the setup is now:
</p>
<ul>
<li>RedHat Enterprise Linux 4 (update 8, kernel version 2.6.9-89.ELsmp)</li>
<li>Dual 2.4 GHz Intel Xeon, with hyperthreading enabled</li>
<li>2 GB RAM</li>
<li>Python 2.7</li>
<li>SCons v2.0.0.final.0.r5023</li>
<li>GNU make 3.81</li>
</ul>
<p>
As previously, the test builds consists of a bunch of compiles and links:  <i>N</i> C files, each with a unique associated header, spread across <i>N/500</i> directories (to ensure there are no filesystem scalability effects), are compiled, then bundled into a standard archive library.  Every 20th object is linked into an executable along with the archive.  The build tree is generated using a Perl script, which generates both SConstruct files and Makefiles for building.  One difference between the two builds:  when using GMake, I added the <font face="Courier New">-MMD</font> flag to gcc, to generate additional dependency information; each timed full build was then preceded by a full build to generate all the dependencies and a clean build to nuke all the generated output.  I felt that this gave a more realistic comparison to SCons, which employs elaborate dependency analysis logic to ensure accurate incremental builds.
</p>
<p><h3>Round 1: Full builds</h3>
</p>
<p>
Again, my interest is primarily full, from scratch builds:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_full1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_full1.png?w=300" alt="" title="scons_full" width="300" height="180" class="aligncenter size-medium wp-image-659" /></a>
</p>
<p>
As expected, SCons&#8217; performance is pretty miserable:  by the time we get to a build with several thousand source files, the build time is already over 15 minutes, and with the same n<sup>2</sup> growth we saw last time, the times race past one hour, then two, finally hitting <b>nearly 5 hours</b> for a build containing just 50,000 source files.
</p>
<p>
GNU make shows a much more sedate, linear growth &mdash; in fact it neatly keeps pace with a simple shell script that does all the same compiles and links as the regular build, but without any of the overhead of a build tool.  Yes, I know that a shell script is no substitute for a proper build tool.  It just serves to give us an idea of how long it takes just to do the work in the build &mdash; a lower bound on the build time.  No build tool running serially can run the build any faster than this (obviously we can do better, by a constant factor, if we use parallel build features).
</p>
<p>
Using that lower bound, we can compute the amount of <i>overhead</i> introduced by the build tool.  This is all the time that the tool spends doing things <i>other than</i> actually running the commands needed to execute the build:  parsing SConstruct files or makefiles, building the dependency graph, traversing that graph, computing command-lines, etc.  Viewing this overhead as a percentage of the total build time gives us an easier-to-digest comparison between SCons and GMake:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_overhead1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_overhead1.png?w=300" alt="" title="scons_overhead" width="300" height="180" class="aligncenter size-medium wp-image-657" /></a>
</p>
<p>
Even with relatively few files &mdash; around 2,000 &mdash; <b>over 50% of the total build time is wasted by overhead with SCons</b>.  In comparison, GMake has barely any overhead on a build of that size.  At the other end of the range, <b>SCons overhead accounts for nearly 90% of the total build time</b>.  GMake overhead has increased as a percentage of the total time too, but only to a modest 20% &mdash; still significantly less than SCons.  In fact, with 50,000 files, GMake overhead is less than half of SCons overhead with just 2,000 files!
</p>
<p>
OK, just a couple more things to touch on here before we look at incremental build times:  first, you probably noticed the sudden hook in the graph for SCons full build times, between 45,000 and 50,000 files.  That means that at that level, on my system, some other factor has kicked in to influence the times.  I believe this is because the system has started to page heavily, as SCons&#8217; memory footprint has grown to consume all available RAM.  Second, if you compare the SCons times in this article with those in the previous article, you&#8217;ll see that this time around, the SCons times are a bit better than last time &mdash; about 30% on average.  Keep in mind that there are several differences between this setup and the previous:  a new OS, a new version of Python, a new version of SCons.  Even a new version of the compiler and other tools used during the build.  I did try to pin down exactly which factors contributed to this improvement by testing various combinations of versions of Python and SCons (for example, Python 2.6.2 with SCons 2.0; or Python 2.7 with SCons 1.2.0); none of these tests produced any significant change in performance, so I presume that the performance difference is most likely due to the new operating system and tools.  I choose not to pursue the matter further than that.
</p>
<p><h3>Round 2: Incremental builds</h3>
</p>
<p>
A lot of people claimed that full builds <i>&#8220;don&#8217;t really matter; developers only really do incremental builds&#8221;</i>.  So let&#8217;s take a look at incremental build performance:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_incr1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_incr1.png?w=300" alt="" title="scons_incr" width="300" height="180" class="aligncenter size-medium wp-image-660" /></a>
</p>
<p>
For these builds, I ran a full build, then patched one C file and its associated header.  This caused a rebuild of one object file, followed by a rebuild of the archive and the executable that the object feeds into.  The actual time to execute just those commands is a paltry one-tenth of a second, so with either tool, the incremental time is dominated by the overhead added by the tool itself.  Even so, it&#8217;s obvious that SCons adds considerably more overhead than GMake.  Even with a small build containing just 2,000 files, <b>SCons burns about 35 seconds to do one-tenth of a second of work</b>.  GMake does the same build in about 3 seconds.  For the 50,000 file build, SCons ran for about 25 minutes, again just to do one-tenth of a second of actual work; GMake ran for about 9 minutes.
</p>
<p>
One thing I find especially interesting about this graph is that supposedly the problem with full builds is that they force SCons to constantly rescan the dependency graph.  That shouldn&#8217;t be necessary in an incremental build, and yet we still see what looks like an O(n<sup>2</sup>) growth in build time.
</p>
<p><h3>Round 3: Clean builds</h3>
</p>
<p>
The last build comparison was a clean build:  <font face="Courier new">scons -c</font> versus <font face="Courier New">gmake clean</font>:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_clean1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_clean1.png?w=300" alt="" title="scons_clean" width="300" height="180" class="aligncenter size-medium wp-image-658" /></a>
</p>
<p>
At last we have found a build variant where SCons performance is in the same ballpark as GMake!  Of course, according to the SCons docs, you&#8217;ll never need to do a clean build with SCons (because with SCons your dependencies are perfectly accurate), so maybe this data point is not actually interesting to SCons users.
</p>
<p><h3>Sudden death overtime: Memory usage</h3>
</p>
<p>
One last comparison:  memory usage.  This metric is of particular interest because it puts a hard limit on the maximum size of the build that the tool can handle.  I&#8217;ve learned the importance of memory efficiency the hard way &mdash; long nights in &#8220;firefighting&#8221; mode to improve memory usage to accomodate this or that customer&#8217;s enormous build.  I&#8217;d hoped that the new versions of Python and SCons used in this test would prove beneficial for SCons memory footprint.  Unfortunately, the opposite is true:  memory usage is about 8% worse now than it was the last time I ran these tests, and as you can see here, SCons uses about 4 1/2 times as much memory as GMake:
</p>
<p>
<a href="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_mem1.png"><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2010/07/scons_mem1.png?w=300" alt="" title="scons_mem" width="300" height="180" class="aligncenter size-medium wp-image-661" /></a>
</p>
<p>
With 50,0000 files, SCons uses just a hair less than 2GB of memory, enough to cause my test system (with 2GB of RAM) to start swapping.  In comparison, GMake needs just 440MB of memory for 50,000 files.  At that rate, GMake won&#8217;t start to thrash my system until the build has grown to more than 225,000 files.  Unfortunately, it seems likely that there&#8217;s not much that the SCons developers can do to fix this:  because SCons is implemented in Python, they are at the mercy of the Python runtime implementation.  There&#8217;s just not enough control over low-level details like memory allocation when you&#8217;re using an interpreted language.
</p>
<p><h3>Conclusions</h3>
</p>
<p>
We can clearly see now that it is not simply &#8220;the nature of the beast&#8221; that SCons performance is so bad.  GMake scales much more gracefully, both in terms of elapsed time, and in memory usage, on a variety of scenarios:  full, one-touch incremental, and clean builds.  From the discussions that the previous post sparked, we know that the primary problem is an inefficient O(n<sup>2</sup>) algorithm in the SCons implementation.  It seems that SCons is academically interesting, but ultimately not practical for any non-trivial build (as some of my customers are now finding out the hard way).
</p>
<p>
A lot of SCons fans, in defense of SCons, say things like &#8220;Well, the SCons developers made a concious decision to focus on <i>correctness</i> rather than <i>performance</i>.&#8221;  But what good is a correct system that is so slow it&#8217;s unusable?  For me, the choice is a no-brainer:  my time is too precious to waste on a slow build tool.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2010/07/21/a-second-look-at-scons-performance/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>Designing for high performance</title>
		<link>http://www.electric-cloud.com/blog/2010/07/12/designing-for-high-performance/</link>
		<comments>http://www.electric-cloud.com/blog/2010/07/12/designing-for-high-performance/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 19:13:37 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[ElectricAccelerator]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[gnu make]]></category>
		<category><![CDATA[parallel builds]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=648</guid>
		<description><![CDATA[Here&#8217;s the thing about high performance: you can&#8217;t just bolt it on at the end. It&#8217;s got to be baked in from day one. No doubt those of you who are experienced developers are now invoking the venerable Donald Knuth, who once said, &#8220;Premature optimization is the root of all evil.&#8221; But look at it [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the thing about high performance:  you can&#8217;t just bolt it on at the end.  It&#8217;s got to be baked in from day one.  No doubt those of you who are experienced developers are now invoking the venerable Donald Knuth, who once said, &#8220;Premature optimization is the root of all evil.&#8221;  But look at it this way:  with <a href="http://jalopnik.com/5497042/how-a-500-craigslist-car-beat-400k-rally-racers">very rare exceptions</a>, no amount of performance tuning will turn an average system into a world class competitor.</p>
<p>
Of course, high performance is the entire raison d&#8217;être for ElectricAccelerator.  We knew from the start that parallelism would be the primary means of achieving our performance goals (although it&#8217;s not the <a href="http://blog.electric-cloud.com/2009/03/11/measuring-electricaccelerator-cache-efficiency/">only</a> <a href="http://blog.electric-cloud.com/2009/04/13/makefile-performance-pattern-specific-variables/">trick</a> we used).  Thanks to <a href="http://en.wikipedia.org/wiki/Amdahl's_law">Amdahl&#8217;s law</a>, we know that in order to accelerate a build by 100x, the serialized portion cannot be more than 1% of the baseline time.  Thus it&#8217;s critical that <i>absolutely everything that <b>can</b> be parallelized, <b>is</b> parallelized.</i>  And I mean <i>everything</i>, even the stuff that you don&#8217;t normally think about, because anything that doesn&#8217;t get parallelized disproportionately saps our performance.  Anything that isn&#8217;t parallelized is a bottleneck.<br />
<span id="more-648"></span>
</p>
<p>
Here&#8217;s an example:  command expansion.  You know, the bit of code that turns something like this:
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:60ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">$(CC) $(addprefix -I,$(dir $(SRCS))) -o $@ $&lt;
</font></div>
</pre>
<p>into something like this:
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:60ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">gcc -I. -Isubdir -o foo.o foo.c
</font></div>
</pre>
<p>
There&#8217;s no way around this translation.  It has to be performed, for every command invoked during the build.  Even if the command doesn&#8217;t have anything that needs to be expanded.
</p>
<p>
What happens if the expansion itself is time-consuming?  For the sake of demonstration, we can make command expansion slow by sticking <font face="Courier New">$(shell sleep 5)</font> into the command &#8212; because the sleep appears inside a $(shell) function call, gmake is obliged to execute the <font face="Courier New">sleep</font> <a href="http://www.gnu.org/software/make/manual/make.html#Variables-in-Commands">as part of expanding the command</a>.  Here&#8217;s the makefile we&#8217;ll use:
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:60ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">all: a b c d e
a b c d e:
        @$(shell sleep 5) echo $@
</font></div>
</pre>
<p>
Now, if you run this with an ordinary, serialized gmake, you&#8217;ll see that it takes about 25 seconds to execute.  No surprise there, with 5 jobs that each have a 5 second sleep:
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:60ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">ericm@chester$ time gmake
a
b
c
d
e

real    0m25.092s
user    0m0.012s
sys     0m0.012s
ericm@chester$
</font></div>
</pre>
<p>
Now, see what happens when we run this same makefile with a parallel gmake invocation:  <b>it still takes 25 seconds</b>, even though we have specified more than enough parallel jobs that all five jobs should run simultaneously!
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:60ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">ericm@chester$ time gmake -j 8
a
b
c
d
e

real    0m25.016s
user    0m0.012s
sys     0m0.012s
ericm@chester$
</font></div>
</pre>
<p>
What&#8217;s going on here?  Let&#8217;s take a quick look at the core algorithm in gmake:
</p>
<ol>
<li>Find the next target that&#8217;s is runnable (all prereqs up-to-date).</li>
<li>Expand the commands for that target.</li>
<li>Run the command for that target.</li>
<li>If the number of currently running commands is equal to the job limit (1 for serial gmake, <i>N</i> for parallel gmake), wait for a command to finish.</li>
<li>Repeat until finished.</li>
</ol>
<p>
Maybe you see the problem already:  gmake is a single-threaded program.  Even when you specify <tt>-j 8</tt>, there&#8217;s only one thread executing that core algorithm.  Parallelism doesn&#8217;t really enter the picture until the end of the algorithm, where gmake decides whether it can go ahead and start working on another target without waiting for the previous one to finish.
</p>
<p>
Being single-threaded means that command expansion is implicitly serialized:  gmake can only expand commands for a single target at a time.  Too bad for you if that expansion takes any significant amount of time.
</p>
<p>
So, could we fix gmake, so that command expansions could be performed in parallel?  Well, it&#8217;s really hard to take something that wasn&#8217;t designed to be high-performance from the start and transform it into something with world-class performance.  In this case, gmake&#8217;s heritage as a single-threaded application permeates every aspect of its implementation.  For example, command expansion is performed using a single global buffer (see <a href="http://cvs.savannah.gnu.org/viewvc/make/expand.c?revision=1.55&amp;root=make&amp;view=markup">expand.c in the gmake sources</a>).  While it&#8217;s certainly <i>possible</i> to refactor this code, it would be non-trivial to do so.
</p>
<p><h3>Command expansion with ElectricAccelerator</h3>
</p>
<p>
In contrast, Accelerator was designed to be multi-threaded from the start, and we have made a deliberate effort to parallelize absolute everything we can, including even the behind-the-scenes stuff that you probably never thought about before reading this blog.  Like command expansion.  And sure enough, if you try this makefile with Accelerator,  the total run time is about 5 seconds:
</p>
<p><pre>
<div style="background:#dee7f7;border:solid thin;width:70ex;margin-left:auto;margin-right:auto"><font size="-1" face="Courier New">ericm@chester$ time emake
Starting build: 12705
a
b
c
d
e
Finished build: 12705 Duration: 0:05 (m:s) Cluster availability: 100%

real    0m5.523s
user    0m0.008s
sys     0m0.016s
ericm@chester$
</font></div>
</pre>
<p><h3>Designing for high performance</h3>
</p>
<p>
When truly world-class performance is your goal, you can&#8217;t wait until implementation is complete to start thinking about performance.  That goal has to inform your decisions at all stages of development, from design to implementation.  With Accelerator, our obsessive focus on performance has resulted in an architecture that allows us to parallelize parts of the build process that other tools simply cannot.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2010/07/12/designing-for-high-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Markov Chains to Generate Test Input</title>
		<link>http://www.electric-cloud.com/blog/2009/09/15/using-markov-chains-to-generate-test-input/</link>
		<comments>http://www.electric-cloud.com/blog/2009/09/15/using-markov-chains-to-generate-test-input/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 14:46:00 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[gnu make]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=542</guid>
		<description><![CDATA[One challenge that we&#8217;ve faced at Electric Cloud is how to verify that our makefile parser correctly emulates GNU Make. We started by generating test cases based on a close reading of the gmake manual. Then we turned to real-world examples: makefiles from dozens of open source projects and from our customers. After several years [...]]]></description>
			<content:encoded><![CDATA[<p>One challenge that we&#8217;ve faced at Electric Cloud is how to verify that our makefile parser correctly emulates GNU Make.  We started by generating test cases based on a close reading of the gmake manual.  Then we turned to real-world examples:  makefiles from dozens of open source projects and from our customers.  After several years of this we&#8217;ve accumulated nearly two thousand individual tests of our gmake emulation, and yet we still sometimes find incompatibilities.  We&#8217;re always looking for new ways to test our parser.</p>
<p>
One idea is to generate random text and use that as a &#8220;makefile&#8221;.  Unfortunately, truly random text is almost useless in this regard, because it doesn&#8217;t look anything like a real makefile.  Instead, we can use <a href="http://en.wikipedia.org/wiki/Markov_chain"><i>Markov chains</i></a> to generate random text that is very much like a real makefile.  When we first introduced this technique, we uncovered 13 previously unknown incompatibilities &mdash; at the time that represented 10% of the total defects reported against the parser!  Read on to learn more about Markov chains and how we applied them in practice.<br />
<span id="more-542"></span>
</p>
<p><h3>Markov chains</h3>
</p>
<p>
A Markov chain is simply a sequence of random values in which the next value is in some way dependent on the current value, rather than being completely random.  Consider the case of generating random text one letter at a time, from the set of uppercase English letters (A-Z).  If the sequence is completely random, then for each character generated, any letter is equally probable.  Regardless of what characters you&#8217;ve generated up to this point, you are just as likely to get a <i>D</i> as an <i>X</i> next.  Think of it as if you have a bag of tiles, one for each letter.  With truly random text, you pick one tile and write down the letter on that tile.  Then you return the tile to the bag and pick again.  Lather, rinse, repeat until you&#8217;ve generated as much text as you like.
</p>
<p>
But suppose we said that the probability of the next character is dependent on the last character we generated.  For example, in English text if you run across the letter <i>Q</i> you can be pretty sure that the next character is going to be <i>U</i>.  It&#8217;s almost certainly not going to be <i>X</i>, or <i>Z</i>, etc.  We can build a table that tells us the probabilty that any given letter will be followed by any other letter.  For the letter <i>Q</i>, the table might look like this:
</p>
<table>
<tr>
<td>Letter</td>
<td>Probability of appearing after <i>Q</i></td>
</tr>
<tr>
<td>A</td>
<td>1%</td>
</tr>
<tr>
<td>E</td>
<td>1%</td>
</tr>
<tr>
<td>I</td>
<td>1%</td>
</tr>
<tr>
<td>O</td>
<td>1%</td>
</tr>
<tr>
<td>U</td>
<td>96%</td>
</tr>
<tr>
<td>All others</td>
<td>0%</td>
</tr>
</table>
<p>
We can do a much better job of generating text that looks like English if we use these tables to guide us.  Imagine that instead of one bag of tiles with one tile for each letter, we have one bag for each letter, and we fill each bag with tiles according to the probabilities in our table.  For example, the bag for the letter <i>Q</i> would contain 96 <i>U</i> tiles, and one tile each for <i>A</i>, <i>E</i>, <i>I</i>, and <i>O</i>.  Each time we want to generate a new letter, we look at the last letter we generated, find the bag of tiles corresponding to that letter, and pick out one tile.  After writing down the letter, we return the tile to the bag and repeat the process.  The sequence of letters that we generate in this manner is a simple Markov chain.
</p>
<p>
How do you build the probability tables?  One way is to generate them from some sample input.  If we have a sufficiently large example of text in the target language, we can scan it and count the number of times each letter occurs, and the number of times it is followed by each letter.  Note that it is critical to have a large and varied input.  If the sample is too small, the resulting probability tables won&#8217;t accurately represent the target language.  The generator will only be able to generate the input text itself.
</p>
<p><h3>Markov chains of order <i>m</i></h3>
</p>
<p>
Of course, we don&#8217;t have to limit ourselves to considering just the single previous letter.  For example, if the previous two characters are <i>ST</i>, then the next character is probably a vowel, or maybe an <i>R</i>.  It&#8217;s probably not going to be <i>D</i>, or <i>Q</i>, etc.  Just as before, we can make a table that tells us the probability that any given pair of letters will be followed by any other letter.  And we can keep going, adding more and more of the preceding letters to our formula.  The number of previous characters we use is called the <i>order</i> of the chain, so if we use the previous 4 characters, we would say we have a <i>Markov chain of order 4</i>.
</p>
<p>
The greater the order of the chain, the &#8220;smarter&#8221; our generator becomes, because it is considering more context when choosing letters.  As you can see, the generated text looks more and more like the target language as you increase the order (although after a some point, you get little additional benefit from further increases):
</p>
<table>
<tr>
<td>Order</td>
<td>Result</td>
</tr>
<tr>
<td>0 (purely random)</td>
<td>ehnee.Alr noer ealcra edctn eIi</td>
</tr>
<tr>
<td>1</td>
<td>Pige foule.ce d futht wrion e mara</td>
</tr>
<tr>
<td>2</td>
<td>Prookiname arg-tm aread on achivedging</td>
</tr>
<tr>
<td>3</td>
<td>Yes, and no usinession be</td>
</tr>
<tr>
<td>4</td>
<td>Project that last it make you first, moderneath.</td>
</tr>
</table>
<p><h3>Using Markov chains in testing</h3>
</p>
<p>
In order to use this technique effectively in testing, you need a couple of things besides the generator itself:
</p>
<ol>
<li><b>A large sample input to seed the generator</b>.  As noted, the bigger your sample input text, the higher the quality of the probability tables, and therefore the more varied your generated text will be.  Since we are trying to generate makefiles, we used several megabytes of makefiles from a variety of open-source projects as the sample text.</li>
<li><b>An automated evaluation mechanism</b>.  You have to be able to determine quickly and automatically if a given generated file is processed correctly or not.  Of course, <i>correct</i> can mean many different things here.  It might be as simple as &#8220;does not cause a program crash&#8221;.  In our case, it means &#8220;emake parses this makefile the same way that gmake does&#8221;, so we use gmake as a <a href="http://en.wikipedia.org/wiki/Reference_implementation"><i>reference implementation</i></a>.  Note that it doesn&#8217;t matter if the generated text truly is a completely valid makefile.  In fact most of the time it will not be.  What matters is those cases is that emake and gmake both report the <i>same</i> error.</li>
</ol>
<p>
We wrote a simple shell script to drive the testing process.  First, it uses the generator to produce several random makefiles.  Then it runs each makefile through both gmake and emake, and compares the results.  Any differences are reported for further investigation.
</p>
<p><h3>Conclusion</h3>
</p>
<p>
Verifying the implementation of an emulator for a complex system is hard, especially when the original system has no formal specification.  Using randomly generated input is a useful way to extend the breadth of your testing, and Markov chains make it possible to generate even more useful random input.  Our original implementation of this technique uncovered several previously unknown defects, and it continues to pay dividends both by uncovering new defects and by providing a confidence measure for our emulation.
</p>
<p>
If you want to play with Markov chains yourself, you can download the <a href="http://github.com/emelski/code.melski.net/blob/master/markov/main.cpp">source for the generator used in this article</a>.  NB: the program has only been compiled and used on Linux; on other platforms your mileage may vary.  For more information about Markov chains, I recommend <a href="http://www.cs.bell-labs.com/cm/cs/pearls/sec153.html">Section 15.3 of the excellent book <i>Programming Pearls</i> by Jon Bentley</a>.</p>
<hr />
<p>
If you enjoyed this article about testing techniques, you may enjoy this related article:
</p>
<ul>
<li><a href="http://blog.electric-cloud.com/2009/05/05/delta-the-coolest-tool-youve-never-heard-of/">Delta: the coolest tool you&#8217;ve never heard of</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/09/15/using-markov-chains-to-generate-test-input/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Makefile performance: built-in rules</title>
		<link>http://www.electric-cloud.com/blog/2009/08/19/makefile-performance-built-in-rules/</link>
		<comments>http://www.electric-cloud.com/blog/2009/08/19/makefile-performance-built-in-rules/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 14:50:19 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[gnu make]]></category>
		<category><![CDATA[makefile]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=511</guid>
		<description><![CDATA[Like any system that has evolved over many years, GNU Make is rife with appendages of questionable utility. One area this is especially noticeable is the collection of built-in rules in gmake. These rules make it possible to do things like compile a C source file to an executable without even having a makefile, or [...]]]></description>
			<content:encoded><![CDATA[<p>Like any system that has evolved over many years, GNU Make is rife with appendages of questionable utility.  One area this is especially noticeable is the collection of <i>built-in rules</i> in gmake.  These rules make it possible to do things like compile a C source file to an executable without even having a makefile, or compile and link several source files with a makefile that simply names the executable and each of the objects that go into it.</p>
<p>
But this convience comes at a price.  Although some of the built-in rules are still relevant in modern environments, many are obsolete or uncommonly used at best.  When&#8217;s the last time you compiled Pascal code, or used SCCS or RCS as your version control system?  And yet every time you run a build, gmake must check every source file against each of these rules, on the off chance that one of them might apply.  <b>A simple tweak to your GNU Make command-line is all it takes to get a performance improvement of up to 30% out of your makefiles</b>.  Don&#8217;t believe me?  Read on.<br />
<span id="more-511"></span>
</p>
<p>
Let&#8217;s look at a trivial example:
</p>
<p><div style="background:#deffde;border:dashed thin;">
<pre>
all: input
	@echo done
</pre>
</div>
<p>
Touch the file <i>input</i>, then run gmake with the <i>-d</i> option, so you can see as gmake tries each of the built-in rules.  GMake will ramble on for hundreds of lines, as you&#8217;ll see.  Here&#8217;s a sample of that output:</p>
<div style="background:#dee7f7;border:dashed thin;">
<pre>
Considering target file `all'.
 File `all' does not exist.
  Considering target file `input'.
   Looking for an implicit rule for `input'.
   <i>... many lines omitted ... </i>
   Trying pattern rule with stem `input'.
   Trying implicit prerequisite `RCS/input,v'.
   Trying pattern rule with stem `input'.
   Trying implicit prerequisite `RCS/input'.
   Trying pattern rule with stem `input'.
   Trying implicit prerequisite `s.input'.
   Trying pattern rule with stem `input'.
   Trying implicit prerequisite `SCCS/s.input'.
   Trying pattern rule with stem `input'.
   <i>... hundreds more lines omitted ...</i>
  No implicit rule found for `input'.
  Finished prerequisites of target file `input'.
</pre>
</div>
<p>
What&#8217;s going on here?  Well, we didn&#8217;t provide a rule describing how to build the file <i>input</i>, so gmake is checking to see if any of the built-in rules could be used to generate it.  Of course none of them do, so this is all wasted effort.  Lucky for us, a single command-line option is all you need to tell gmake not to bother with the default built-in rules:  <i>-r</i>.  Try that trivial makefile again, this time with <i>-d -r</i>:
</p>
<p><div style="background:#dee7f7;border:dashed thin;">
<pre>
Considering target file `all'.
 File `all' does not exist.
  Considering target file `input'.
   Looking for an implicit rule for `input'.
   No implicit rule found for `input'.
   Finished prerequisites of target file `input'.
</pre>
</div>
<p>
All the extra nonsense is gone!  And even on this toy example, there is a measurable performance improvement:  originally, this makefile runs in about 0.015s (average over three runs); with the built-in rules disabled, it&#8217;s just 0.012s.  But I can see you won&#8217;t be convinced by such a trivial example.  So let&#8217;s try something a bit bigger:
</p>
<p><div style="background:#deffde;border:dashed thin;">
<pre>
SOURCES:=$(wildcard sub/*.x)
TARGETS:=$(SOURCES:.x=.o)
all: $(TARGETS)
        @echo done

%.o: %.x
        @echo $@
</pre>
</div>
<p>
The directory <i>sub</i> contains 15,000 files named 00001.x through 15000.x.  With the built-in rules (and redirecting output to /dev/null), this makefile runs in about 60.2s; without the built-in rules, 42.9s &mdash; <b>28% faster</b>.
</p>
<p>
Finally, let&#8217;s try this optimization on a real build.  I built one component of the Accelerator project completely, then ran &#8220;no-op&#8221; builds (ie, no work to be done, just checking that everything is up-to-date).  With built-in rules, this took 6.0s; without, it took 5.2s &mdash; <b>13% faster</b>:
</p>
<table rules="none" cellpadding="4">
<caption align="bottom"><font size="-1">Test results (shorter is better)</font></caption>
<tr>
<td>Large test, with built-ins:</td>
<td>
<div style="background:#85aef7;width:240px;">60.2s</div>
</td>
</tr>
<tr>
<td>Large test, no built-ins;</td>
<td>
<div style="background:#a3ffa3;width:172px;">42.9s</div>
</td>
</tr>
<tr>
<td>No-op build, with built-ins:</td>
<td>
<div style="background:#85aef7;width:24px;">6.0s</div>
</td>
</tr>
<tr>
<td>No-op build, no built-ins:</td>
<td>
<div style="background:#a3ffa3;width:21px;">5.2s</div>
</td>
</tr>
</table>
<p>
Now, if your build actually relies on built-in rules obviously you can&#8217;t simply disable them.  But you could explicitly define just those rules that you require and disable the rest.  For example, if you use the default <i>%.o: %.cpp</i> rule, you could add just that rule to your makefiles:
</p>
<p><div style="background:#deffde;border:dashed thin;">
<pre>
%.o: %.cpp
	$(COMPILE.cpp) $(OUTPUT_OPTION) $&lt;
</pre>
</div>
<p>
Once you&#8217;ve done that, you can add <i>-r</i> to your command-line and enjoy the benefits.  If you go this route, you can see the list of built-in rules by running <i>gmake -p</i>; the built-ins are marked as &#8220;built-in&#8221; in that output.
</p>
<p><h3>Conclusion</h3>
</p>
<p>
Disabling gmake&#8217;s array of built-in rules is an easy way to squeeze extra performance out of your makefiles, particularly on large builds and on no-op builds.  All you have to do is add <i>-r</i> to your commmand-line.  (NB: If you prefer more descriptive command-lines, you can use <i>&#8211;no-builtin-rules</i> instead!)
</p>
<p><hr />
This article is the latest of several looking at different aspects of makefile performance.  If you liked this article, you may enjoy the others in the series:</p>
<ul>
<li><a href="http://blog.electric-cloud.com/2009/03/18/it-goes-to-11/">Makefile performance: recursive make</a></li>
<li><a href="http://blog.electric-cloud.com/2009/03/23/makefile-performance-shell/">Makefile performance: $(shell)</a></li>
<li><a href="http://blog.electric-cloud.com/2009/04/13/makefile-performance-pattern-specific-variables/">Makefile performance: pattern-specific variables</a></li>
<li>Makefile performance: built-in rules (this post)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/08/19/makefile-performance-built-in-rules/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Friday Fun: Generating Fibonacci Numbers with GNU Make</title>
		<link>http://www.electric-cloud.com/blog/2009/08/14/friday-fun-generating-fibonacci-numbers-with-gnu-make/</link>
		<comments>http://www.electric-cloud.com/blog/2009/08/14/friday-fun-generating-fibonacci-numbers-with-gnu-make/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 21:01:19 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[gnu make]]></category>
		<category><![CDATA[makefile]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=500</guid>
		<description><![CDATA[Nobody would ever claim that GNU Make is a general purpose programming language, but with a little work, we can coerce it into generating Fibonacci numbers for us. Why bother? Because we can. First, let me give a simple demonstration. Here&#8217;s the makefile: Invoke it with a single argument, the length of the sequence to [...]]]></description>
			<content:encoded><![CDATA[<p>Nobody would ever claim that GNU Make is a general purpose programming language, but with a little work, we can coerce it into generating <a href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers</a> for us.  Why bother?  <b>Because we can</b>.<br />
<span id="more-500"></span></p>
<p>
First, let me give a simple demonstration.  Here&#8217;s the makefile:
</p>
<p><pre class="brush: python; title: ; notranslate">
16:=x x x x x x x x x x x x x x x x
input_int:=$(foreach a,$(16),$(foreach b,$(16),$(foreach c,$(16),$(16))))
decode=$(words $1)
encode=$(wordlist 1,$1,$(input_int))
decr=$(wordlist 2,$(words $1),$1)
decr2=$(wordlist 3,$(words $1),$1)
eq=$(filter $(words $1),$(words $2))
g0:=
g1:=x

fib=$(if $(filter-out undefined,$(origin f$1)),\
           $(f$1),\
           $(if $(call eq,$1,$(g0)),\
	          $(eval f$1:=$(g0))$(g0),\
                  $(if $(call eq,$1,$(g1)),\
                         $(eval f$1:=$(g1))$(g1),\
                         $(eval f$1:=$(call fib,$(call decr2,$1)) $(call fib,$(call decr,$1)))$(f$1))))

print=$(if $1,\
$(call print,$(call decr,$1))$(info $(call decode,$1): $(call decode,$(f$1))),\
$(info 0: 0))

%:
	@:$(if x$(call fib,$(call encode,$@)),$(call print,$(call encode,$@)),)
</pre>
</p>
<p>
Invoke it with a single argument, the length of the sequence to generate:
</p>
<p><div style="background:#deffde;border:dashed thin;width:80ex">
<pre>ericm@chester:~/blog/fibonacci$ gmake 10
0: 0
1: 1
2: 1
3: 2
4: 3
5: 5
6: 8
7: 13
8: 21
9: 34
10: 55</pre>
</div>
<p>
Nifty!
</p>
<p>
Now, although this demonstration is not especially practical, it does make use of a few advanced gmake concepts:  arithmetic; caching dynamically generated variables; and recursive functions.
</p>
<p><h3>Arithmetic</h3>
</p>
<p>
None of this would be possible without support for arithmetic operations.  Here&#8217;s a brief explanation of how this works (for more details, the Mr. Make article <a href="http://www.cmcrossroads.com/content/view/6504/268/">Learning GNU Make Functions with Arithmetic</a>):  we use strings of space-separated <i>x</i> characters to represent values; for example, the number five is represented as <i>x x x x x</i>.  To add numbers together, we concatenate the string representations, and to subtract, we trim the appropriate number of <i>x</i>&#8216;s from the string.  To convert from the string representation to the numeric value, we just count the number of <i>x</i>&#8216;s in the string, using the <i>$(words)</i> builtin, and finally, to convert from the numeric value to the string representation we extract the appropriate number of <i>x</i>&#8216;s from a canonical string that just has a series of several thousand <i>x</i>&#8216;s.
</p>
<p>
This is inefficient, in both memory and time.  The representation of a number requires double the value in bytes, so the number 50,000 requires 100,000 bytes of storage.  Converting to a numeric value from the string representation is a linear operation that scales with the magnitude of the value &#8212; the bigger the value, the longer the conversion takes.  These factors together limit the range of numbers that you can practically work with using this scheme, although it works fine for small values (up to around 10,000 or so).  For the Fibonacci makefile, the inefficiencies mean that we can only generate the sequence out to about the 40th value.  At that point, gmake has already sucked up 1.4 GB of memory!
</p>
<p><h3>Caching dynamic values</h3>
</p>
<p>
Because it is time-consuming to compute Fibonacci numbers this way, we&#8217;d like to cache the results, so we don&#8217;t ever duplicate work.  But the values are dynamically generated.  For that matter, the variables themselves must be dynamically generated &mdash; we don&#8217;t know beforehand which Fibonacci numbers we&#8217;re going to have to compute.  So how do you dynamically generate variables and cache their values in gmake?  With <i>$(eval)</i>.  You can read all about it in another Mr. Make article, <a href="http://www.cmcrossroads.com/content/view/7382/268/">$(eval) and macro caching</a>.  In our Fibonacci makefile, you can see that as we determine each Fibonacci number, we invoke $(eval) to save the value.  The <i>fib</i> function is then setup to first check for the existence of a cached value and only bother with the computation if there is no cache entry.
</p>
<p><h3>Recursive functions</h3>
</p>
<p>
The Fibonacci sequence is defined recursively as <i>f(n) = f(n &#8211; 1) + f(n &#8211; 2)</i>, so naturally we use a recursive function in our implementation.  The <i>fib</i> function takes one argument, the index of the Fibonacci number to compute, encoded using the scheme described above.  After checking the cache, <i>fib</i> checks if the current index is either zero or one.  By convention, these Fibonacci numbers are defined to have value zero and one, respectively, so there is no need to compute them.  This also serves as a terminating condition so we don&#8217;t recurse infinitely.
</p>
<p>
If the current index is neither zero nor one, then <i>fib</i> calls itself recursively twice, to compute the two preceding Fibonacci values.  The recursive results are combined, cached, and returned as the overall result of the <i>fib</i> function itself.
</p>
<p><h3>Conclusion</h3>
</p>
<p>
I&#8217;m surprised at how quickly gmake computes the sequence, actually.  On my laptop, gmake computes the first 30 values in a fraction of a second.  After that the inefficiencies in the arithmetic operations really come into play:  it takes about 3.5 seconds to compute 35 values, and 25 seconds to compute 39 values.  Still, considering the limitations of the environment, I think it&#8217;s impressive that it works at all.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/08/14/friday-fun-generating-fibonacci-numbers-with-gnu-make/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rules with Multiple Outputs in GNU Make</title>
		<link>http://www.electric-cloud.com/blog/2009/08/04/rules-with-multiple-outputs-in-gnu-make/</link>
		<comments>http://www.electric-cloud.com/blog/2009/08/04/rules-with-multiple-outputs-in-gnu-make/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 02:49:25 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[makefile]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=449</guid>
		<description><![CDATA[I recently wrote an article for CM Crossroads exploring various strategies for handling rules that generate multiple output files in GNU make. If you&#8217;ve ever struggled with this problem, you should check out the article. I don&#8217;t want to spoil the exciting conclusion, but it turns out that the only way to really correctly capture [...]]]></description>
			<content:encoded><![CDATA[<p>I recently wrote an article for CM Crossroads exploring various strategies for handling rules that generate multiple output files in GNU make.  If you&#8217;ve ever struggled with this problem, you should <a href="http://www.cmcrossroads.com/cm-articles/cm-basics/12905-rules-with-multiple-outputs-in-gnu-make">check out the article</a>.  I don&#8217;t want to spoil the exciting conclusion, but it turns out that the only way to really correctly capture this relationship in GNU make syntax is with pattern rules.  That&#8217;s great if your input and output files share a common stem (eg, &#8220;parser&#8221; in parser.i, parser.c and parser.h), but if your files don&#8217;t adhere to that convention, you&#8217;re stuck with one of the alternatives, each of which have some strange caveats and limitations.</p>
<p>
Here&#8217;s a question for you:  if ElectricAccelerator had an extension that allowed you to explicitly mark a non-pattern rule as having multiple outputs, would you use it?  For example:
</p>
<p><div style="background:#deffde;border:dashed thin;">
<pre>
<font color="red">#pragma multi</font>
something otherthing: input
	@echo Generating something and otherthing from input...
</pre>
</div>
<p>
What do you think?  Comments encouraged.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/08/04/rules-with-multiple-outputs-in-gnu-make/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>ElectricAccelerator vs. distcc: samba reloaded</title>
		<link>http://www.electric-cloud.com/blog/2009/07/20/electricaccelerator-vs-distcc-samba-reloaded/</link>
		<comments>http://www.electric-cloud.com/blog/2009/07/20/electricaccelerator-vs-distcc-samba-reloaded/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 17:52:18 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[distcc]]></category>
		<category><![CDATA[ElectricAccelerator]]></category>
		<category><![CDATA[ElectricAccelerator vs distcc]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[parallel builds]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=419</guid>
		<description><![CDATA[ElectricAccelerator vs distcc &#8211; samba reloaded In an earlier post I compared the performance of ElectricAcclerator and distcc by building samba using each tool in turn on the same cluster. In that test I found that Accelerator bested distcc at suitably high levels of parallelism, but that distcc narrowly beat Accelerator at lower levels of [...]]]></description>
			<content:encoded><![CDATA[<h3>ElectricAccelerator vs distcc &#8211; samba reloaded</h3>
<p>In an <a href="http://blog.electric-cloud.com/2009/03/02/electricaccelerator-vs-distcc-round-3-samba/">earlier post</a> I compared the performance of ElectricAcclerator and distcc by building samba using each tool in turn on the same cluster.  In that test I found that Accelerator bested distcc at suitably high levels of parallelism, but that distcc narrowly beat Accelerator at lower levels of parallelism.  At the time I chalked the difference up to slightly higher overhead associated with Accelerator.  But you must have known I couldn&#8217;t just leave it at that.  I had to know where the overhead was coming from, and eliminate it, if possible.  The exciting conclusion is after the break.<br />
<span id="more-419"></span></p>
<h3>Samba Reloaded</h3>
<p>
To recap, I previously found that samba is a very CPU-intensive build, despite being written entirely in C.  This fact was demonstrated empirically by examining the performance on one dual-core host using just <i>gmake -j</i> at varying levels of parallelism.  Past <i>-j 2</i>, the performance degraded sharply.  For the distcc and emake tests, I used a cluster of 12 dual-core hosts.  Eleven served as workers, for a total of 22 CPU&#8217;s.  The remaining host was used as the build host (and cluster manager, for emake tests).  Here are the original results:
</p>
<p>
<img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/03/samba_distcc_vs_emake1.png" alt="Distcc vs. emake, building samba" />
</p>
<p>
Until we got to about 11 CPU&#8217;s, distcc appeared to have a slight edge on emake.  I had lots of theories that could explain the difference:  maybe our Electric File System (EFS) was slower than ext3fs, or maybe the Electric Agent was sluggish in supplying metadata to the EFS, or in processing file usage data from it.  Maybe lock contention in emake itself was causing the problem.
</p>
<p>
Of course before I could test any of these theories I had to make sure I could reproduce the original behavoir.  I set up a five-node cluster using the same dual-core hosts I used previously, plus one additional node to serve as the build host (unfortunately the other half of the cluster was reserved for other tests &#8212; I&#8217;m not the <i>only</i> person doing work here at Electric Cloud, after all!).  This gave me a total of 10 worker CPU&#8217;s.  After installing the latest version of Accelerator (4.5.0), I fired off a series of three builds each with Accelerator and distcc, using 10 workers.  When those builds completed, I computed the average build time for each tool &#8212; and found that Accelerator beat distcc by a small margin.
</p>
<p><h3>Deeper Into the Rabbit Hole</h3>
</p>
<p>
This result was wholly unexpected, given the results from the previous tests.  The next step was to run a series of builds with varying numbers of workers, from 1 to 10.  Here are the results:
</p>
<p><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/07/distcc_versus_emake_default1.png" alt="Distcc vs. emake, building samba" />
</p>
<p>
Now the results are more in line with my expecation:  with low levels of parallelism distcc appears to perform better, but Accelerator catches up and finally surpasses distcc once enough resources are engaged.  The breakeven point has moved though, from about 11 CPU&#8217;s to about 9 CPU&#8217;s.  In addition, there was an outlyer in both sets of results:  with just one worker, Accelerator was consistently faster than distcc.  That didn&#8217;t fit well with my theories &#8212; if the EFS was slow, for example, then Accelerator would have been slower than distcc with one worker.
</p>
<p><h3>A New Theory</h3>
<p>As I puzzled over this new data, something clicked that caused me to remember a subtle difference between distcc and Accelerator:  they use different strategies to determine how to allocate jobs to workers in the cluster.  Accelerator prefers to fully load one host before running jobs on another host; distcc prefers to spread the load across as many hosts as possible before doubling up on any one host.  The following images illustrate the result obtained when using these different strategies to assign five parallel jobs to a cluster of ten workers on five hosts:
</p>
<p><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/07/emake_distribution1.png" alt="Accelerator distribution of jobs to workers in the cluster" />
</p>
<p><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/07/distcc_distribution1.png" alt="Distcc distribution of jobs to workers in the cluster" />
</p>
<p>
This realization led to a new theory:  perhaps the performance difference observed with low numbers of workers was simply an artifact of this difference in worker allocation strategies.  We&#8217;ve already seen that this particular build is especially CPU intensive.  Two jobs on one host have just two CPU&#8217;s that they must share; two jobs on two hosts have four total CPU&#8217;s available.  This theory would also explain why emake &#8220;catches up&#8221; with distcc &#8212; as more and more jobs are run in parallel, distcc is forced to assign multiple jobs to a single host.  Eventually, both systems have fully loaded all the available cluster nodes, so the difference in allocation strategies becomes moot.
</p>
<p>
Armed with this theory, I altered my benchmark so that emake would use the same allocation strategy as distcc, by explicitly enabling and disabling agents via the cluster manager.  For example, for a trial with two agents, I enabled one agent each on two cluster nodes.  This technique allowed me to better compare the relative performance of distcc and Accelerator.  Here are the results from this test:
</p>
<p><img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/07/distcc_versus_emake_controlled1.png" alt="Distcc vs. emake, building samba, controlled for different allocation strategies" />
</p>
<p>
With the allocation strategy out of the equation, Accelerator actually has a small, consistent edge over distcc (about 2%) on small clusters.  And the previous test showed that Accelerator scales better than distcc, so on large clusters the difference is even more pronounced (about 15%).
</p>
<p><h3>Should We Change Accelerator?</h3>
</p>
<p>
An obvious question is whether we would consider changing the allocation strategy in Accelerator.  The answer is probably no.  The strategy we use, although suboptimal for this particular build, actually works very well across a wide variety of builds.  One of the key advantages of this strategy is that it allows Accelerator to minimize network overhead, since agents on a single host can share various kinds of data directly.  There are relatively few builds that skew so heavily towards CPU utilization, so changing the strategy to benefit those special cases at the expense of the more common case seems unwise.
</p>
<p><h3>A Champion Vindicated</h3>
</p>
<p>
Although we previously declared Accelerator the victor versus distcc when building samba, it was not without some reservations.  With the new results shown here, I&#8217;m satisfied that we made the correct decision:  CPU-for-CPU, Accelerator is more efficient and scales better than distcc, at all cluster sizes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/07/20/electricaccelerator-vs-distcc-samba-reloaded/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Makefile performance: pattern-specific variables</title>
		<link>http://www.electric-cloud.com/blog/2009/04/13/makefile-performance-pattern-specific-variables/</link>
		<comments>http://www.electric-cloud.com/blog/2009/04/13/makefile-performance-pattern-specific-variables/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 22:53:49 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Build-Test-Deploy Best Practices]]></category>
		<category><![CDATA[gmake]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=284</guid>
		<description><![CDATA[If you&#8217;ve been using GNU make for some time, you are probably familiar with both pattern rules and target-specific variables. You may even be familiar with the intersection of these features: pattern-specific variables. But you may not be aware of a subtle change in gmake 3.81 which affects the processing of pattern-specific variables with potentially [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve been using GNU make for some time, you are probably familiar with both <i><a href="http://www.gnu.org/software/make/manual/make.html#Pattern-Rules">pattern rules</a></i> and <i><a href="http://www.gnu.org/software/make/manual/make.html#Target_002dspecific">target-specific variables</a></i>.  You may even be familiar with the intersection of these features:  <i><a href="http://www.gnu.org/software/make/manual/make.html#Pattern_002dspecific">pattern-specific variables</a></i>.  But you may not be aware of a subtle change in gmake 3.81 which affects the processing of pattern-specific variables with potentially disastrous performance consequences.</p>
<p>
<span id="more-284"></span></p>
<h3><b>Pattern-specific Variables 101</b></h3>
</p>
<p>
Pattern-specific variables are similar to target-specific variables in that they define a variable value that applies only in a particular context, but where target-specific variables apply to a single specific target, pattern-specific variables apply to all targets that match the given pattern.  This gives us a way to get target-specific-variable-like behavior when using pattern rules.  It also allows us a more flexible syntax for target-specific variables, even in the absence of pattern rules.  A few examples should make everything clear.  First, regular target-specific variables:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
.PHONY: a b
FLAGS=default flags
all: a b

a: FLAGS=special flags for a
a b:
	@echo $@ using FLAGS='"$(FLAGS)"'
</pre>
</div>
<p>
Here we have defined <i>FLAGS</i> with a default value at the global scope, and a custom value for target <i>a</i>.  When you run this makefile with gmake, you&#8217;ll see the value used for <i>FLAGS</i> is different for the two targets, as expected.  Now, let&#8217;s look at typical pattern-specific variable usage:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
.PHONY: a b
FLAGS=default flags
all: a.x b.x c.x d.z e.z

%.x: FLAGS=special flags for .x files
%.x:
	@echo $@ using FLAGS='"$(FLAGS)"'

%.z:
	@echo $@ using FLAGS='"$(FLAGS)"'
</pre>
</div>
<p>
Again we have a custom value for <i>FLAGS</i>, but this time it applies to all targets matching the pattern <i>%.x</i>.  We could achieve the same behavior by using normal target-specific variables for each of the <i>.x</i> files of course, but the pattern-specific variable definition is more succinct and convenient, and it is more robust, since we need not hardcode the list of targets to which the new definition applies.  Finally, let&#8217;s look at pattern-specific variables <i>without</i> pattern rules:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
.PHONY: foobar foobaz fooboo booboo
FLAGS=default flags
all: foobar foobaz fooboo booboo

foo%: FLAGS=special flags for foo files

foobar:
	@echo Building foobar using FLAGS='"$(FLAGS)"'

foobaz:
	@echo Building foobaz using FLAGS='"$(FLAGS)"'

fooboo:
	@echo Building fooboo using FLAGS='"$(FLAGS)"'

booboo:
	@echo Building booboo using FLAGS='"$(FLAGS)"'
</pre>
</div>
<p>
With this example you can see that pattern-specific variables are used for each of the <i>foo files</i>, even though we have specified explicit rules for each of those files.  That implies that gmake searches through the pattern-specific variables looking for variables that should apply for a given target independent of the search for a rule to build the target.
</p>
<p><h3><b>What happens when multiple patterns match my target?</b></h3>
</p>
<p>
Suppose we modify our previous makefile like this:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
.PHONY: foobar foobaz fooboo booboo
FLAGS=base
all: foobar foobaz fooboo booboo

foo%: FLAGS+=extra_foo_flags
%boo: FLAGS+=extra_boo_flags

foobar:
	@echo Building foobar using FLAGS='"$(FLAGS)"'

foobaz:
	@echo Building foobaz using FLAGS='"$(FLAGS)"'

fooboo:
	@echo Building fooboo using FLAGS='"$(FLAGS)"'

booboo:
	@echo Building booboo using FLAGS='"$(FLAGS)"'
</pre>
</div>
<p>
What would you expect the value of <i>FLAGS</i> to be when building <i>fooboo</i>?  After all, both <i>foo%</i> and <i>%boo</i> match the target <i>fooboo</i>.  In fact, there are two possibilities, both perfectly reasonable, both consistent with at least some other aspect of gmake behavior.
</p>
<p>
The first possibility is that when building <i>fooboo</i>, <i>FLAGS</i> has the value <i>base extra_foo_flags</i>.  That is, gmake applies the pattern-specific variables from the <i>first</i>, and only the first, pattern that matches the target.  This is consistent with the way that gmake searches patterns to find a rule to build a target:  as soon as one match is found, gmake stops searching.  GNU make 3.80 uses the &#8220;first match&#8221; policy.
</p>
<p>
The second possibility is that <i>FLAGS</i> has the value <i>base extra_foo_flags extra_boo_flags</i> when building <i>fooboo</i>.  That is, gmake applies all the pattern-specific variables from <i>all</i> patterns that match the target, in the order the variables are defined in the makefile.  This is a bit more intuitive, and is more consistent with the way variable definition in general works in gmake.  GNU make 3.81 uses the &#8220;all matches&#8221; policy.
</p>
<p><h3><b>Performance Comparison</b></h3>
</p>
<p>
In both 3.80 and 3.81, the search for pattern-specific variables that apply to a given target involves a search of the pattern-specific variable definitions.  The difference is that in 3.80, the search scales with <i>the number of patterns that have pattern-specific variable definitions</i>, and the search can stop as soon as a match is found.  In gmake 3.81, the search search scales with <i>the number of pattern-specific variable definitions</i>, and the search must always inspect every single definition, even after the first match is found.  To demonstrate the impact of this change, I did a series of tests.  I created several makefiles like the following:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
%: FOO:=abc
%: FOO:=abc
%: FOO:=abc
...
all: 1 2 3 4 5 ... 10000
1 2 3 4 5 ... 10000: ; @/bin/true
</pre>
</div>
<p>
These makefiles each have a single <i>all</i> target with 10,000 prerequisites.  I varied the number of pattern-specific variable definitions from 1,000 to 80,000 and timed how long it took to run the makefile in dry-run mode with both gmake 3.80 and 3.81.  For kicks, I also included the runtimes for ElectricAccelerator (emake) in 3.81 emulation mode.  Here are the results:
</p>
<p>
<img src="http://www.electric-cloud.com/blog/wp-content/uploads/2009/04/pattern_specific_variables_graph11.png"/>
</p>
<p>
No doubt some of you are thinking that this is a toy example, so not particularly applicable to the real world.  After all, nobody assigns the same variable over and over like that.  That&#8217;s an excellent point.  Unfortunately, when I tried to create 20,000 unique pattern-specific variables, gmake 3.81 crashed after sucking up all the available RAM on my system.  Oops!
</p>
<p>
The bigger question is, who would ever have tens of thousands of pattern-specific variables?  The answer is:  people who have switched from a recursive build to a non-recursive build.  In fact, I have seen a single makefile with over 180,000 pattern-specific variables, attached to just 18 distinct patterns.  The point is, as crazy as it seems, builds of this scale <b>do exist</b> in the &#8220;real world&#8221;.
</p>
<p><h3><b>Are you at risk?</b></h3>
</p>
<p>
To find the pattern-specific variable definitions in your makefiles, you can use the following command:
</p>
<div style="background:#deffde;border:solid thin;width:60ex;">
<pre style="font-family:Consolas, Monaco, &quot;font-size:12px;">
egrep "^[^:]*%[^:]*: *[^=]*=" Makefile
</pre>
</div>
<p>
If you have lots of pattern-specific variables, what can you do to reduce the performance impact?  A few ideas come to mind:
</p>
<ul>
<li>Switch to gmake 3.80 and hope that a future release will address this problem.</li>
<li>Convert your pattern-specific rules to explicit target-specific rules, perhaps using $(eval) to generate the new variable definitions dynamically so you don&#8217;t have to type out every one by hand.</li>
<li>Switch to a recursive build, which would allow you to partition your pattern-specific variables so that gmake only ends up searching the variables that are likely to apply to the targets referenced in a given makefile.</li>
<li>Use ElectricAccelerator, which emulates gmake 3.81 behavior but uses a more efficient algorithm in its implementation.</li>
</ul>
<p><b>Update</b>: Restored the results graph to the post, which mysteriously disappeared thanks to WordPress.</p>
<p><hr />
This article is one of several looking at different aspects of makefile performance.  If you liked this article, you may enjoy the others in the series:</p>
<ul>
<li><a href="http://blog.electric-cloud.com/2009/03/18/it-goes-to-11/">Makefile performance: recursive make</a></li>
<li><a href="http://blog.electric-cloud.com/2009/03/23/makefile-performance-shell/">Makefile performance: $(shell)</a></li>
<li>Makefile performance: pattern-specific variables (this post)</li>
<li><a href="http://blog.electric-cloud.com/2009/08/19/makefile-performance-built-in-rules/">Makefile performance: built-in rules</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/04/13/makefile-performance-pattern-specific-variables/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

