CMake: Building with all your cores

As a distance runner, it is important to run using a fully engaged core. This allows for the most efficient means of moving towards my running goals. Software developers are equally motivated to use as much of their “cores” as possible to build software. OK, I admit this is a bit of a lame analogy, but I don’t think you would find too many developers that are not interested in building software as fast as possible using all of the horse power available on the hardware they are using. The CMake build system and its developers have always been aware of how important parallel builds are, and have made sure that CMake could take advantage of them when possible.

Since CMake is a Meta build tool that does not directly build software, but rather generates build files for other tools, the approaches to parallel building differ from generator to generator and platform to platform. In this blog, I will cover the approaches for parallel builds on the major platforms and tool chains supported by CMake.

First some terms:

  • Target Level Parallelism – This is when a build system builds high level targets at the same time. High level targets are things like libraries and executables.
  • Object Level Parallelism – This is when a build system builds individual object files at the same time. Basically, it invokes the compiler command line for independent objects at the same time.
  • CMake generator – A CMake generator is a target build tool for CMake. It is specified either in the cmake-gui or with the –G command line option to cmake.

I will start with Linux, followed by Apple OSX, and finish up with Windows.

Linux:

GNU Make

The traditional gmake tool which is usually installed as “make” on Linux systems can run parallel builds. It is used by CMake’s “Unix Makefiles” generator. To have parallel builds with gmake, you need to run gmake with the –jN command line option. The flag tells make to build in parallel. The N argument is used to specify how many jobs are run in parallel during the build. For minimum build times, you want to use a value of N that is one more than the number of cores on the machine.  So, if you have a quad core Linux machine, you would run make –j5.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build

cd  build

cmake –G”Unix Makefiles” ../src

make –j5

 

ninja

Some developers at Google recently created a new build tool called ninja. This is a replacement for the GNU make tool. ninja was created to run faster than make and of course run parallel builds very well. Fortunately, CMake now has a ninja generator so that your project can take advantage of this new tool. Unfortunately, if you are using CMake to build Fortran 95 or greater code that makes used of Fortran modules you will have to stick to GNU make.  The ninja support for Fortran depend information is not yet implemented in CMake. (if you are interested in this, please send me an email). If your project does not include Fortran code, then ninja might be a good tool for you to try. ninja is very quick to figure out that it has nothing to do which is important for incremental builds of large projects.

To use ninja you will need to first build ninja from source. The source for ninja can be found here: git://github.com/martine/ninja.git. You will need python and a c++ compiler to build ninja. There is a README in the top of the ninja source tree that explains how to build it. Basically, you just run python bootstrap.py. This will produce a ninja executable. Once it is built, you will need to put ninja in your PATH so CMake can find it.

ninja does not require a –j flag like GNU make to perform a parallel build. It defaults to building cores +2 jobs at once (thanks to Matthew Woehlke for pointing out that it is not simply 10 as I had originally stated.).  It does however accept a –j flag with the same syntax as GNU make, -j N where N is the number of jobs run in parallel. For more information run ninja –help with the ninja you have built.

Once you have ninja built and installed in your PATH, you are ready to run cmake.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build

cd  build

cmake –GNinja ../src

ninja

 

Mac OSX

Mac OSX is almost the same as Linux and both GNU make and ninja can be used by following the instructions in the Linux section. Apple also provides an IDE build tool called Xcode. Xcode build performs parallel builds by default. To use Xcode, you will obviously have to have Xcode installed. You run cmake with the Xcode generator.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –GXcode ../src

# start Xcode IDE and load the project CMake creates, and build from the IDE

# or you can build from the command line like this:

cmake -–build . –config Debug

 

Note, cmake –build can be used for any of the CMake generators, but is particularly useful when building IDE based generators from the command line.  You can add options like -j to cmake --build by putting them after the -- option on the command line.  For example, cmake --build . --config Debug -- -j8 will pass -j8 to the make command

Windows:

The Windows platform actually has the greatest diversity of build options. You can use the Visual Studio IDE, nmake, GNU make, jom, MinGW GNU make, cygwin’s GNU Make, or ninja. Each of the options has some merit. It depends on how you develop code and which tools you have installed to decide which tool best fits your needs.

Visual Studio IDE

This is a very popular IDE developed by Microsoft. With no extra options the IDE will perform target level parallelism during the build. This works well if you have many targets of about the same size that do not depend on each other. However, most projects are not constructed in that maner. They are more likely to have many dependencies that will only allow for minimal parallelism. However, it is not time to give up on the IDE. You can tell it to use object file level parallelism by adding an extra flag to the compile line.

The flag is the /MP flag which has the following help: “/MP[N] use up to 'n' processes for compilation”.  The N is optional as /MP without an n will use as many cores as it sees on the machine.  This flag must be set at CMake configure time instead of build time like the –j flag of make. To set the flag you will have to edit the CMake cache with the cmake-gui and add it to the CMAKE_CXX_FLAGS and the CMAKE_C_FLAGS.  The downside is that the IDE will still perform target level parallelism along with object level parallelism which can lead to excessive parallelism grinding your machine and GUI to a halt. It has also been known to randomly create bad object files. However, the speed up is significant so it is usually worth the extra trouble it causes.

GNU Make on Windows

Using GNU Make on Windows is similar to using it on Linux or the Mac. However, there are several flavors of GNU make that can be found for Windows. Since I am talking about achieving maximum parallelism, you need to make sure that the make you are using supports the job-server. The makefiles that CMake generates are recursive in implementation http://www.cmake.org/Wiki/CMake_FAQ#Why_does_CMake_generate_recursive_Makefiles.3F. This means that there will be more than one make process will be running during the build. The job-server code in gmake allows these different processes to communicate with each other in order to figure out how many jobs to start in parallel.

The original port of GNU make to Windows did not have a job-server implementation. This meant that the –j option was basically ignored by windows GNU make when recursive makefiles were used. The only option was to use the Cygwin version of make. However, at some point the Cygwin make stopped supporting C:/ paths which meant that it could not be used to run the Microsoft compiler. I have a patched version of Cygwin’s make that can be found here:  (www.cmake.org/files/cygwin/make.exe )

Recently, someone implemented the job-server on Windows gmake as seen on this mailing list post:

http://mingw-users.1079350.n2.nabble.com/Updated-mingw32-make-3-82-90-cvs-20120823-td7578803.html

This means that a sufficiently new version of MinGW gmake will have the job server code and will build in parallel with CMake makefiles. 

To build with gmake on windows, you will first want to make sure the make you are using has job-server support. Once you have done that, the instructions are pretty much the same as on Linux.  You will of course have to run cmake from a shell that has the correct environment for the Microsoft command line cl compiler to run. To get that environment you can run the Visual Studio command prompt. That command prompt basically sets a bunch of environment variables that let the compiler find system include files and libraries. Without the correct environment CMake will fail when it tests the compiler.

There are three CMake generators supporting three different flavors of GNU make on windows. They are MSYS Makefiles, Unix Makefiles and MinGW Makefiles. MSYS is setup to find the MSYS tool chain and not the MS compiler. MinGW finds the MinGW toolchain. Unix Makefiles will use the CC and CXX environment variables to find the compiler which you can set to cl for the MS compiler.

If you are using the Visual Studio cl compiler and want to use gmake, the two options are the “Unix Makefiles” or the “MinGW Makefiles” generators with either the patched Cygwin gmake, or a MinGW make new enough to have the job-server support. The MSYS generator will not work with the MS compiler because of path translation issues done by the shell. Once you have the environment setup for the compiler and the correct GNU make installed, you can follow the instructions found in the Linux section basically cmake, make –jN.

JOM

The legacy command line make tool that comes with Visual Studio is called nmake. nmake is a makefile processor like GNU make with a slight different syntax. However, it does not know how to do parallel builds. If the makefiles are setup to run cl with more than one source file at a time, the /MP flag can be used to run parallel builds with nmake. CMake does not create nmake makefiles that can benefit from /MP. Fortunately, Joerg Bornemann a Qt developer created the jom tool.

jom is a drop in replacement for nmake and is able to read and process nmake makefiles created by CMake. jom will perform object level parallelism, and is a good option for speeding up the builds on Windows. Jom can be downloaded in binary form from here: http://releases.qt-project.org/jom. There is a jom specific generator called “NMake Makefiles JOM”. Here is an example (assumes jom is in the PATH):

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –G” NMake Makefiles JOM” ../src

jom

 

ninja

ninja is used on Windows pretty much the same way it is used on Linux or OSX. You still have to build it which will require installing python. To obtain and build ninja see the Linux section on ninja. You will also need to make sure that you have the VS compiler environment setup correctly. Once you have ninja.exe in your PATH and cl ready to be used from your shell, you can run the CMake Ninja generator. Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –GNinja ../src

ninja

 

Conclusion

It is possible although not entirely obvious especially on Windows to build with all the cores of your computer. Multiprocessing is obviously here to stay, and performance gains will be greater if parallel builds are taken advantage as the number of core available increases. My laptop has 4 real cores and 4 more with hyperthreading with a total of 8 cores. Recently, I have been using ninja with good results as I mostly use emacs and the visual studio compiler from the command line. Prior to ninja I used the Cygwin version of gmake.  I would be interested to hear what other people are using and if you have performance tests of the various forms of build parallelism available.

 

Share