Accelerating Software Delivery

Makefile performance: $(shell)

One rookie performance mistake I’ve seen in GNU make makefiles is the use of $(shell) without := assignment. Of course I’m not the first person to write about this, but people are still making this mistake, and it’s so easy to fix, it’s really tragic that it’s still out there.

UPDATE: read more about GNU make and the shell here!

GNU Make Variables

So what’s the problem? GNU make supports two fundamental types of variables: simple and recursive. The difference between the two is in how they are expanded. Simple variables are expanded once, when the variable is declared; recursive variables are expanded every time the variable is referenced. This is a subtle but important distinction. Consider this makefile:

FOO=abc
BAR_1:=$(FOO)
BAR_2 =$(FOO)
FOO=def

all:
	@echo $(BAR_1)
        @echo $(BAR_2)

If you’re not familiar with the differences in gmake variables, the output from this build may surprise you. Heck, I’m painfully familiar with the differences, and the output would still surprise me if I hadn’t read the makefile very carefully:

abc
def

What’s going on here? The variable BAR_1 is declared as a simple variable, because I used := to create it. This caused gmake to immediately expand the text on the right hand side of the expression and capture the result as the value of BAR_1. At the time BAR_1 was declared, FOO had the value abc, so that’s what we got in BAR_1.

In contrast, BAR_2 is declared as a recursive variable, because I used the usual = (no colon) to declare it. In this case, gmake captures the unexpanded literal text $(FOO) as the value of BAR_2; later, when I referenced BAR_2, gmake recursively expanded the value to obtain current value of BAR_2. At that time, FOO had the value def, so that’s what we see used as the value of BAR_2.

$(shell)

$(shell) is a special function in gmake that runs an external command and captures the output for use in the makefile. For example, you could get the current working directory like this:

CWD=$(shell pwd)
all:
	@echo This makefile lives in $(CWD).

It is the interaction between recursive variables and $(shell) that causes the performance problem. Suppose you have a simple makefile like this:

BUILDID=$(shell date +%Y%m%d)
OUTTOP=../out/$(BUILDID)

all: $(OUTTOP)/myprog
	@echo Finished building build id $(BUILDID).

$(OUTTOP)/myprog: $(OUTTOP)/foo.o $(OUTTOP)/bar.o $(OUTTOP)/baz.o
	@echo Building $@ from $^

$(OUTTOP)/%.o: ./%.c $(OUTTOP)/dummy
	@echo Building $@ from $^

$(OUTTOP)/dummy:
	mkdir -p $(OUTTOP)
	touch $(OUTTOP)/dummy

This makefile uses $(shell) to generate a build identifier based on the date that the build was invoked. The build id is then used in the name of the output directory, to ensure that each build is written to a unique output location. Unfortunately, because I carelessly declared BUILDID as a recursive variable, gmake will invoke the shell every time the variable is referenced. Even in this tiny makefile that proves to be a surprisingly large number of references. You can see each reference with a clever little trick: add $(warning) to the declaration of BUILDID:

BUILDID=$(warning Invoking the shell)$(shell date +%Y%m%d)

$(warning) won’t affect the value of BUILDID because it expands to the empty string, but it will cause gmake to print a message to stderr every time the variable is expanded. Now when we run the build, we see the following output:

Makefile:4: Invoking the shell
Makefile:7: Invoking the shell
Makefile:7: Invoking the shell
Makefile:7: Invoking the shell
Makefile:7: Invoking the shell
Makefile:10: Invoking the shell
Makefile:10: Invoking the shell
Makefile:13: Invoking the shell
Makefile:14: Invoking the shell
Makefile:14: Invoking the shell
Creating output directory dummy.
Building ../out/20090320/foo.o.
Building ../out/20090320/bar.o.
Building ../out/20090320/baz.o.
Building ../out/20090320/myprog.
Makefile:5: Invoking the shell
Finished building build id 20090320.

BUILDID is expanded eleven times in this tiny little build, and that’s with just a few object files, and one final build product. The great thing is, it’s trivial to fix this problem. Simply by changing the declaration of BUILDID so it is a simple variable, we eliminate all but one call to the shell:

BUILDID:=$(warning Invoking the shell)$(shell date +%Y%m%d)

 

Makefile:1: Invoking the shell
Creating output directory dummy.
Building ../out/20090320/foo.o.
Building ../out/20090320/bar.o.
Building ../out/20090320/baz.o.
Building ../out/20090320/myprog.
Finished building build id 20090320.

Tracking down offenders

If you’re using ElectricAccelerator, you can add --emake-pedantic=1 to the command-line, and emake will issue a warning for every recursive variable that contains a call to $(shell):

Makefile:1: The variable `BUILDID' has value `$(shell date +%Y%m%d)'
which contains a call to $(shell ).  This could slow down the build
if the variable is referenced many times; consider using := assignment
instead.

If you’re not using ElectricAccelerator, you can use grep to find the offenders:

ericm@chester:src$ grep -n -e '[^:]=.*\$(shell' Makefile
1:BUILDID=$(shell date +%Y%m%d)

So what are you waiting for? Quit dragging your build performance down with $(shell) busy work.


This article is one of several looking at different aspects of makefile performance. If you liked this article, you may enjoy the others in the series:

 

No Comments

2 Trackbacks

Leave a comment