<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Electric Cloud Blog &#187; annolib</title>
	<atom:link href="http://www.electric-cloud.com/blog/tag/annolib/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.electric-cloud.com/blog</link>
	<description>This is your source for private development cloud best practices and technical tips and tricks for Electric Cloud solutions</description>
	<lastBuildDate>Thu, 02 Feb 2012 22:32:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Annocat: e pluribus unum</title>
		<link>http://www.electric-cloud.com/blog/2009/05/27/annocat-e-pluribus-unum/</link>
		<comments>http://www.electric-cloud.com/blog/2009/05/27/annocat-e-pluribus-unum/#comments</comments>
		<pubDate>Wed, 27 May 2009 17:37:18 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[annolib]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[ElectricAccelerator]]></category>

		<guid isPermaLink="false">http://blog.electric-cloud.com/?p=384</guid>
		<description><![CDATA[ElectricAccelerator annotation files are a fantastic way to get a grip on your build behavior and performance, but what if your Build (capital B) spans more than one invocation of emake? Annotation gives you a good look inside any single invocation, but there&#8217;s no way to get an overview of the entire process. You can&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>ElectricAccelerator annotation files are a fantastic way to get a grip on your build behavior and performance, but what if your Build (capital B) spans more than one invocation of emake?  Annotation gives you a good look inside any single invocation, but there&#8217;s no way to get an overview of the entire process.  You can&#8217;t just catenate the annotation files from subsequent emake runs &#8212; the result won&#8217;t be well-formed XML, and the timing information for jobs in each subsection of the build will reflect time from the start of that subsection, not from the start of the logical build.  Plus, you run the risk of having overlapping job identifiers in different subsections.  What you need is a specialized version of cat that is annotation-aware.  In this article I&#8217;ll introduce <a href="https://github.com/emelski/code.melski.net/blob/master/annocat/annocat.pl"><i>annocat</i></a>, a simple Perl script I wrote for just this purpose, and I&#8217;ll explain how it works.<br />
<span id="more-384"></span></p>
<p><h3><b>What does <i>annocat</i> do?</b></h3>
</p>
<p>
Annocat has a single purpose:  concatenate a series of annotation files from real emake invocations into one annotation file representing a single logical build.  In order to do this correctly we have to do a few transformations on the original data:
</p>
<ul>
<li>Job identifiers from each source file are rewritten so they are scoped by the build identifier of the build containing that job.  For example, if we have a job with identifier J0830fdc0 in build 12345, annocat will replace that identifier with J12345_0830fdc0, not only in the <i>&lt;job&gt;</i> tag but also anywhere else the identifier appears in annotation, such as in the <i>&lt;waitingJobs&gt;</i> tag.  This transformation ensures that we don&#8217;t have collisions between job identifiers in different source builds.</li>
<li>Timing information is adjusted so that it is relative to the start of the logical build, rather than relative to the start of any individual actual build.</li>
<li>Environment and properties blocks are discarded from all but the first real build.  The metrics block is discarded entirely.</li>
<li>Additional pseudo-jobs are created in the result to tie all the real builds together into a single logical build.  The logical build represents the actual builds as submakes spawned from a series of serialized jobs in a synthentic make instance.</li>
</ul>
<p><h3><b>How do I use <i>annocat</i>?</b></h3>
</p>
<p>
Annocat works like the standard <i>cat</i> utility, but on annotation files.  <a href="https://github.com/emelski/code.melski.net/raw/master/annocat/annocat.pl">Download it here</a>, then invoke it like this:
</p>
<div style="background:#deffde;border:solid thin;width:80ex">
<pre>
perl annocat.pl build_1234.xml build_1235.xml build_1235.xml &gt; combined.xml
</pre>
</div>
<p>
After running annocat, you&#8217;ll can load the result in ElectricInsight and run all your favorite reports.
</p>
<p><h3><b>How does <i>annocat</i> work?</b></h3>
</p>
<p>
Annocat is a simple Perl script that uses the standard Perl streaming XML parser <i>XML::Parser</i> to process the annotation file one tag at a time.  I chose the streaming parser because annocat does not need to track a lot of state, so we don&#8217;t need the sophistication of a DOM-style parser.  I chose to implement annocat in Perl rather than Tcl-and-annolib because I wanted to show how you might work with annotation data in Perl; because you don&#8217;t need all the power of annolib for this simple task; and because I wanted to remind myself how much I dislike Perl.
</p>
<p>
The basic premise of annocat is simple:  as each tag is read from a source annotation file, annocat checks the type of the tag and performs any required transformations, then prints the tag to standard out.  Although it&#8217;s straightforward stuff, the final script is a few hundred lines of code, so I won&#8217;t go through it line-by-line here.  I will point out a couple of tricky bits, however.
</p>
<p>
First is the bit where annocat adjusts timing information.  A global variable, <i>gElapsed</i> tracks the elapsed time as of the start of the annotation file currently being processed.  This variable starts at zero and is updated after each annotation file is completed.  You can see the update in the main loop of the program, around line 287.  When annocat emits the timing data for a job, around line 153, it just adds the elapsed time to the real timing data extracted from the annotation file, thereby shifting the logical start time by the required amount.
</p>
<p>
Second is the bit where annocat emits pseudo-jobs to provide the top-level structure of the logical build.  The tricky part is determining when to emit these jobs, and making sure that they are emitted in the correct context &#8212; that is, in keeping with the annotation format, the rule job that spawns a submake must immediately preceed the opening &lt;make&gt; tag for the submake, and the rule job should list the parse job of the submake as a waitingJob.  Since we&#8217;re using a streaming parser, by the time we get to that parse job, unfortunately, we will already have processed and emitted the opening &lt;make&gt; tag.  So we have to do something a little more clever around the start of each build:  instead of blindly copying the first &lt;make&gt; tag for the build to standard out, annocat buffers that tag temporarily, until it gets to the first &lt;job&gt; tag in that build.  Then we have the information we need to create the fake rule job, so we do so, and only then do we emit the buffered &lt;make&gt; tag.  Switching into buffered mode occurs around line 126, when annocat detects that it has found a new &lt;build&gt; tag; emitting the fake rule job occurs around line 181, when annocat detects the first job in the new build.  Around that line you&#8217;ll also see where annocat outputs the follow job for the previous build.
</p>
<p><h3><b>Future work</b></h3>
</p>
<p>
Although it is functional as currently implemented, there&#8217;s more that could be done with <i>annocat</i>.  First, it would be nice if it didn&#8217;t just dump the environment data from the second and subsequent real build.  One way to handle it would be to move it under the first make instance in the build and using environment-level annotation format to capture the deltas between the new environment and the environment for the first build.  Second, it would be nice if annocat didn&#8217;t drop the data in the <i>&lt;metrics&gt;</i> sections.  One solution would be to aggregate the metrics from all source builds and emit a single unified block of metrics for the logical build.  I leave these enhancements as an exercise for the reader.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2009/05/27/annocat-e-pluribus-unum/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Untangling Parallel Build Logs</title>
		<link>http://www.electric-cloud.com/blog/2008/12/01/untangling-parallel-build-logs/</link>
		<comments>http://www.electric-cloud.com/blog/2008/12/01/untangling-parallel-build-logs/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 19:44:39 +0000</pubDate>
		<dc:creator>Eric Melski</dc:creator>
				<category><![CDATA[Electric Cloud Solutions]]></category>
		<category><![CDATA[annolib]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[parallel builds]]></category>

		<guid isPermaLink="false">http://ecloud.wordpress.com/?p=102</guid>
		<description><![CDATA[I spend most of my time with ElectricAccelerator working on the &#8220;big&#8221; features &#8212; performance, scalability, fault-tolerance. It&#8217;s easy to forget that there are a ton of &#8220;little&#8221; features that can themselves make a big difference in the value of the system. Case in point: the build log. If you have any experience with parallel [...]]]></description>
			<content:encoded><![CDATA[<p>I spend most of my time with ElectricAccelerator working on the &#8220;big&#8221; features &#8212; performance, scalability, fault-tolerance.  It&#8217;s easy to forget that there are a ton of &#8220;little&#8221; features that can themselves make a big difference in the value of the system.  Case in point:  the build log.  If you have any experience with parallel build systems, you know what a mess the build log becomes because you have any number of parallel commands all dumping output to a single logfile simultaneously.  The output from each command gets interleaved with the output from other commands.  Worse, the error messages get jumbled up too, so it becomes difficult to tell which commands are producing the errors.</p>
<p>I was reminded of this issue when it popped up again on the GNU make-help mailing list.  Take a look at <a href="http://www.nabble.com/How-to-output-result-in-order-in-parallel-mode--td20168211.html">this recent post</a>:</p>
<p><span id="more-102"></span></p>
<div style="padding-left:20px;border-left:5px solid #ddd;color:#777;margin:15px 30px 0 10px;">
<p>In parallel mode (with -j option), the outputs from different rules are intermixed. I&#8217;m wondering if there is a way to order the outputs as if make is run in serial mode, but it should still achieve the speed of parallel mode.</p>
</div>
<p>Unfortunately, as this poster discovered, there is no really good solution to this problem, at least not unless there is help from the build tool itself.  Some people have tried tricks like <a href="http://www.nabble.com/colouring-output-td17026168.html">colorizing the output</a> by piping each command through a filter that sets the font color before printing the output, but any such solution is going to be unreliable because it doesn&#8217;t address the fundamental problem of having multiple processes trying to write output simultaneously.  Plus it&#8217;s a real nuisance to modify your makefiles to add the filter.</p>
<h3>How ElectricAccelerator Does Build Logs</h3>
<p>Seeing that message on make-help reminded me that I take for granted the way that Accelerator handles the build log.  Simply put, Accelerator emits the build log in correct serial order, without any interleaving of output from different commands &#8212; even though those commands are running in parallel and frequently out-of-order.  The log you get at the end of the build is identical to the log you would have gotten if you&#8217;d run a single-threaded, serialized build.</p>
<p>Think about that for a second:  the build log is emitted in serial order.  No more scrambled build logs.  No more kludgy partial solutions, because we solved the problem at the level of the build tool.  You just get consistent, predictable, <i>useful</i> logs.  Every time.</p>
<p>Consider this trivial makefile:</p>
<div style="background:#dee7f7;border:dashed thin;width:80ex;">
<pre>
  all: a b c

  a b c:
          @for n in 1 2 3 4 ; do echo $@-$$n &amp;&amp; sleep 1 ; done
</pre>
</div>
<p>Now compare the output when run by regular old serial gmake with the output when run by parallel gmake <code>-j 4</code>, and output from Accelerator (emake) (<font color="red">red</font> text indicates places where the output differs from serial gmake, <b>bold</b> text indicates places where the output differs from the previous run of the same make variant):</p>
<table rules="rows columns" border cellpadding="8">
<tr>
<th>serial</th>
<th>parallel gmake</th>
<th>parallel gmake</th>
<th>parallel gmake</th>
<th>emake</th>
<th>emake</th>
<th>emake</th>
</tr>
<tr>
<td>
a-1<br />
a-2<br />
a-3<br />
a-4<br />
b-1<br />
b-2<br />
b-3<br />
b-4<br />
c-1<br />
c-2<br />
c-3<br />
c-4
</td>
<td>
a-1<br />
<font color="red">b-1</font><br />
<font color="red">c-1</font><br />
<font color="red">b-2</font><br />
<font color="red">a-2</font><br />
<font color="red">c-2</font><br />
<font color="red">a-3</font><br />
<font color="red">b-3</font><br />
<font color="red">c-3</font><br />
<font color="red">b-4</font><br />
<font color="red">a-4</font><br />
c-4
</td>
<td>
a-1<br />
<font color="red">b-1</font><br />
<font color="red">c-1</font><br />
<font color="red"><b>a-2</b></font><br />
<font color="red"><b>b-2</b></font><br />
<font color="red">c-2</font><br />
<font color="red">a-3</font><br />
<font color="red">b-3</font><br />
<font color="red">c-3</font><br />
<font color="red"><b>a-4</b></font><br />
<font color="red"><b>b-4</b></font><br />
c-4
</td>
<td>
a-1<br />
<font color="red"><b>c-1</b></font><br />
<font color="red"><b>b-1</b></font><br />
<font color="red">a-2</font><br />
<font color="red"><b>c-2</b></font><br />
<font color="red"><b>b-2</b></font><br />
<font color="red">a-3</font><br />
<font color="red"><b>c-3</b></font><br />
<font color="red"><b>b-3</b></font><br />
<font color="red">a-4</font><br />
<font color="red"><b>c-4</b></font><br />
<font color="red"><b>b-4</b></font>
</td>
<td>
a-1<br />
a-2<br />
a-3<br />
a-4<br />
b-1<br />
b-2<br />
b-3<br />
b-4<br />
c-1<br />
c-2<br />
c-3<br />
c-4
</td>
<td>
a-1<br />
a-2<br />
a-3<br />
a-4<br />
b-1<br />
b-2<br />
b-3<br />
b-4<br />
c-1<br />
c-2<br />
c-3<br />
c-4
</td>
<td>
a-1<br />
a-2<br />
a-3<br />
a-4<br />
b-1<br />
b-2<br />
b-3<br />
b-4<br />
c-1<br />
c-2<br />
c-3<br />
c-4
</td>
</tr>
</table>
<p>The parallel gmake output is different from serial gmake on every run, and what&#8217;s worse, the parallel runs are not even consistent with each other from one run to the next!  On the other hand, the output from Accelerator is identical to serial gmake, every time.</p>
<h3>Anything we can do, we can do better</h3>
<p>Of course we&#8217;re never satisfied with simply solving a problem that plagues users around the world.  We want to provide a solution that is so far superior to the alternatives that you&#8217;d be crazy not to want it.  Therefore, we created <i>annotated build logs</i>, or simply <i>annotation</i>, in late 2003.  Annotation is a version of your build log marked up with XML to provide bundles of information above and beyond the standard build output log, in a format that is easily searched and manipulated.  This is the solution that everybody <a href="http://www.scons.org/wiki/SummerOfCodeIdeas/KarlPietrzakSoC2006Proposal?highlight=%28xml%29%7C%28buildlog%29">wishes they had</a>, and the solution that the developers of other parallel make tools <a href="http://www.bell-labs.com/project/nmake/rnotes-10/s2.html#s2.1">wish they had thought of first.</a>  Here&#8217;s a bit of the annotation for our example build:</p>
<div style="background:#dee7f7;border:dashed thin;width:80ex;">
<pre>
  &lt;job file="Makefile" name="a" neededby="J01513660" id="J01511fb0"&gt;
  &lt;argv&gt;for n in 1 2 3 4 ; do echo a-$n &amp;&amp; sleep 1 ; done
  &lt;/argv&gt;
  &lt;output src="prog"&gt;a-1
  a-2
  a-3
  a-4
  &lt;/output&gt;
  &lt;/job&gt;
  &lt;job file="Makefile" name="b" neededby="J01513660" id="J01511fb8"&gt;
  &lt;argv&gt;for n in 1 2 3 4 ; do echo b-$n &amp;&amp; sleep 1 ; done
  &lt;/argv&gt;
  &lt;output src="prog"&gt;b-1
  b-2
  b-3
  b-4
  &lt;/output&gt;
  &lt;/job&gt;
</pre>
</div>
<p>There are a few things in particular that I want to point out in this snippet:</p>
<ol>
<li>Every job in the build is reported separately from every other job, with the commands and output from each easily identifiable.</li>
<li>You can generate the traditional build output log from annotation simply by concatenating the text in the <code>&lt;output&gt;</code> tags.</li>
<li>Annotation includes even those commands that are not echoed to the standard build output log because of the use of the &#8220;silent&#8221; prefix (@) in the Makefile!</li>
</ol>
<p>If you are comfortable with XML you can probably already see the benefits here:  you could easily put together some scripts with Perl or even just with <a href="http://robur.slu.se/jensl/xmlclitools/">xmlgrep and xmlfmt</a> that search the content of <code>&lt;output&gt;</code> tags for things that look like warnings or errors, then reports precisely which job produced that output.  Alternatively, you can use our <code>annolib</code> library to build that script:</p>
<pre class="brush: java; title: ; notranslate">
#!/usr/bin/tclsh

load annolib.so

set anno [anno create]
set xml  [open emake.xml r]
$anno load $xml
close $xml

$anno jobiterbegin
while { [$anno jobitermore] } {
    set job [$anno jobiternext]
    set match false
    foreach commandBlock [$anno job commands $job] {
        foreach {lineNumber argv outputs} $commandBlock {
            foreach {src text} $outputs {
                if { [regexp -nocase {(warning)|(error)} $text] } {
                    set match true
                }
            }
        }
    }
    if { $match } {
        puts &quot;Problem found in job $job ([$anno job name $job]):&quot;
        foreach commandBlock [$anno job commands $job] {
            foreach {lineNumber argv outputs} $commandBlock {
                puts &quot;\t$argv&quot;
                foreach {src text} $outputs {
                    if { $src == &quot;prog&quot; } {
                        puts &quot;\t$text&quot;
                    }
                }
            }
        }
    }
}
</pre>
<p>Just like that, you have a simple script you can use as a starting point for your own post-build reporting on errors and warnings in your build.</p>
<p>So with ElectricAccelerator, you get the best of both worlds:  fast parallel builds, and build logs that you can actually use.  Just another way that Accelerator rocks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electric-cloud.com/blog/2008/12/01/untangling-parallel-build-logs/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

