Developing Galacticus

The Galacticus Build System

Galacticus use a GNU Make-based build system. Dependencies are automatically discovered, and extensive meta-programming and preprocessing of source files is undertaken as part of the build. As a result, the build system is complicated. In this section the build system is described and detailed.

The Makefile

Files generated during the build (with the exception of final executables) are written to the directory specified by the BUILDPATH environment variable. This is normally work/build/ (and is set by the Makefile) unless a special build configuration is requested.

Automatic Discovery

Interdependencies between source files, together with requirements for auto-generated code are automatically discovered during the build process by processing of source files. All such automatic discovery is described below.

Code Directive Parsing

The modular nature of Galacticus is enabled through the use of directives embedded within the source code (XML documents embedded as comment lines starting with a special tag) which instruct the build system how to link together the various functions. Parsing of these directives is handled by the codeDirectivesParse.py script.

For include directives (which generate files that get included into the source), the script generates $(BUILDPATH)/*.xml files which describe the directive, and $(BUILDPATH)/Makefile_Directives which contains rules for building those include files.

The script also outputs a file $(BUILDPATH)/directiveLocations.xml which provides list of files that contain each directive. This is used by other scripts to permit rapid processing of files associated with each directive.

Executable Files

Files which produce executables (including the Galacticus.exe executable) are discovered by the scripts/build/findExecutables.py script, which generates rules describing how these files should be built, along with all of their dependencies, and writes them to $(BUILDPATH)/Makefile_All_Execs. All Fortran files in the source directory are parsed, and executable files are identified as those containing a program statement. All executables so identified are also added to the list of objects that will be built by make all.

Modules Provided

Determination of which files provide which Fortran modules is carried out by the scripts/build/moduleDependencies.py script, which generates rules describing these dependencies (and how to build the module file by compiling the corresponding source file) and writes them to $(BUILDPATH)/Makefile_Module_Dependencies.

Additionally, rules are generated which describe how to build the following classes of file:

*.mod.d

Contains a list of all object files upon which the module depends. Used in constructing the final set of objects which must be linked to build an executable.

*.mod.gv

Contains a list of all source files upon which the module depends. Used in constructing GraphViz visualizations of dependencies.

*.m

Contains a list of all modules provided by the corresponding object file. Used when building object files to test whether corresponding module files have been changed—allows avoidance of recompilation cascades if modules do not need to be updated (i.e. if the modules do not differ).

Modules Used

Determination of which files use which Fortran modules is carried out by the scripts/build/useDependencies.py script, which generates rules describing these dependencies and writes them to $(BUILDPATH)/Makefile_Use_Dependencies.

Additionally, rules are generated which describe how to build the following classes of file:

*.d

Contains a list of all object files upon which each object file depends. Used in constructing the final set of objects which must be linked to build an executable.

*.gv

Contains a list of all source files upon which each file depends by virtue of use statements. Used in constructing GraphViz visualizations of dependencies.

*.fl

Contains a list of libraries upon which each object file depends. Used in constructing the final set of libraries which must be linked with the executable.

Included Files

Dependencies on files due to the use of “include” statements (in both Fortran and C source files) are discovered by the scripts/build/includeDependencies.py script, which generates rules describing these dependencies and writes them to $(BUILDPATH)/Makefile_Include_Dependencies.

All Fortran and C/C++ source files in the source directory are parsed. Files specified in any include statement in a source file are added as a dependency of that source file if unless the source file is C/C++ and the named include file can be found in either source/, a standard system include path, or in any include path specified in the GALACTICUS_CFLAGS or GALACTICUS_CPPFLAGS environment variables.

Parameter Dependencies

All runtime parameters upon which a given executable may depend are discovered by the scripts/build/parameterDependencies.py script, which generates XML files listing these parameters and writes them to $(BUILDPATH)/<executableName>.parameters.xml. All Fortran and C++ files in the source directory are parsed, and embedded parameter definitions (i.e. embedded XML blocks in lines beginning “!@” for Fortran or “//@” for C++ with root element inputParameter) are identified. Any files included into a source file are also searched. In the case of Fortran source files, source/*.F90, the corresponding preprocessed file, $(BUILDPATH)/*.p.F90, will be searched for parameter dependencies.

Code Generation

During build, code is generated automatically. Much of this is to link together functions within Galacticus (thereby permitting the modular nature of Galacticus), but also includes generation of various utility functions. All code generation is described below.

Directives

Generation of code to implement the functionality of include directives is carried out by the buildCode.py script. This script simply reads an XML file generated by the codeDirectivesParse.py (see Section Code Directive Parsing), calls the appropriate function to generate the necessary code, passes this through the preprocessor (see Section Galacticus Preprocessor Directives), and outputs the result to the appropriate include file.

Pre-processing and Post-processing

Before being compiled, all Fortran source files are passed through a tree-based preprocessor. The functionality of this preprocessor is described in Section Galacticus Preprocessor Directives. The preprocessor is invoked on each source file by the preprocess.py script, which takes a source/*.F90 file as input and outputs a preprocessed file $(BUILDPATH)/*.p.F90.

Output from the gfortran compiler is directed through the postprocess.py script. This script currently performs the following functions:

  • Builds a map from line numbers in the preprocessed source file back to line numbers in the unpreprocessed source, and translates line numbers in any error or warnings messages from the compiler into line numbers in the unpreprocessed source;

  • Filters out incorrect warning messages about the lack of scalar finalizers (see GCC PR58175);

  • Checks for unused function attributes in the source and filters out warnings about these unused functions.

Build Files

All files involved in the build process are summarized below.

scripts/build/buildCode.py:

Acts upon include directives embedded in the source code and generates the corresponding include file (see Section Directives);

scripts/build/codeDirectivesParse.py:

Discovers and parses directives embedded in source files. Generates make rules for the resulting dependencies, a file providing a mapping of which files contain each directive, and rule files describing how to build each include file resulting from a include directive;

scripts/build/includeDependencies.py:

Discovers (and generates rules for) dependencies between files arising from the use of “include” statements (see Section Included Files);

scripts/build/findExecutables.py:

Discovers (and generates rules for) files which generate executables (see Section Executable Files);

scripts/build/libraryDependencies.py:

Determines the set of libraries that must be linked with each executable, and ensures they are ordered correctly for static linking;

scripts/build/moduleDependencies.py:

Discovers (and generates rules for) dependencies of module files on their source file statements (see Section Modules Provided);

scripts/build/parameterDependencies.py:

Determines all parameters upon which a given executable depends (see Section Parameter Dependencies);

scripts/build/postprocess.py:

Postprocesses the output of the gfortran compiler to remove spurious warnings and provide line numbers in warning/error reports that point into the unpreprocessed source files (see Section Pre-processing and Post-processing);

scripts/build/preprocess.py:

Preprocesses Fortran source code to provide various extended functionality (see Section Pre-processing and Post-processing);

scripts/build/useDependencies.py:

Discovers (and generates rules for) dependencies between files originating from module use statements (see Section Modules Used);

$(BUILDPATH)/Makefile_All_Execs:

Contains rules describing how to build the executable file for each distinct program—generated by scripts/build/findExecutables.py (see Section Executable Files);

$(BUILDPATH)/Makefile_Component_Includes:

Describes dependencies on the nodeComponent class hierarchy module on include files which implement specific functionality for individual node component implementations;

$(BUILDPATH)/Makefile_Directives:

Describes rules for building include files for include directives;

$(BUILDPATH)/Makefile_Include_Dependencies:

Contains rules describing the dependencies of source file on files included via “include” statements—generated by scripts/build/includeDependencies.py (see Section Included Files);

$(BUILDPATH)/Makefile_Module_Dependencies:

Contains rules which describe how to build module files from their source files (see Section Modules Provided);

$(BUILDPATH)/Makefile_Use_Dependencies:

Contains rules which describe dependencies between source file originating from module use statements (see Section Modules Used);

$(BUILDPATH)/directiveLocations.xml:

Contains a map of which files contain each code directive (see Section Code Directive Parsing);

$(BUILDPATH)/*.xml:

Contain rules for building files for include files (see Section Code Directive Parsing);

$(BUILDPATH)/*.d:

Contain lists of dependencies on object files, and are accumulated to find the final set of files which must be linked to build each executable (see Section Modules Provided & Section Modules Used);

$(BUILDPATH)/*.m:

Contain lists of modules provided by each object file, and are used during testing whether modules have been updated (see Section Modules Provided);

$(BUILDPATH)/*.fl:

Contain lists of libraries upon which each object file depends, and are used to determine the final set of libraries which must be linked with each executable (see Section Modules Used);

$(BUILDPATH)/*.gv:

Contain lists of source files upon which each file depends and are used in the construction of GraphViz representations of file dependencies (see Section Modules Provided & Section Modules Used);

$(BUILDPATH)/*.Inc:

Unpreprocessed include files generated from include dependencies (see Section Code Directive Parsing);

$(BUILDPATH)/*.inc:

Preprocessed include files generated from include dependencies (see Section Code Directive Parsing);

$(BUILDPATH)/*.p.F90:

Preprocessed Fortran source files (see Section Pre-processing and Post-processing);

$(BUILDPATH)/*.parameters.xml:

Contain lists of parameters upon which an executable depends (see Section Parameter Dependencies).

Profiling the Build

Tooling is provided to help profile the build process. This is useful to identify bottlenecks during build. Profiling can be switched on using the compileprof build option. For example:

make -j16 GALACTICUS_BUILD_OPTION=compileprof Galacticus.exe >& build.log

Every command run by Make will be timed, its peak memory usage measured, and the results output. In the above these outputs are collected to the build.log file. Result lines begin ++Task: followed by the start and end times of the task, the peak resident set size of the task (in kilobytes; a value of \(-1\) indicates that memory usage could not be measured, for example because GNU time is unavailable), and then the command run. While you can look through these manually a script is provided that generates a report from these timing data as a web page. For example:

./scripts/build/buildProfiler.py build.log profile.html --durationMinimum 10

will parse timing data from build.log and generate a web page profile.html showing the results. In this case the --durationMinimum 10 option specifies that only commands which took 10 or more seconds to run should be included in the report. (Reducing this limit will lead to a very long report.)

The profile report begins by listing the total time taken for the build, together with an estimate of the peak memory used by the build—found by summing, for each second of the build, the peak memory of all included tasks running at that time. Following that is a list of all commands performed ordered by completion time. Next to each command is a bar which extends from the start to the end time of the command. Each second of the bar is colored to indicate the degree of build parallelism at that time—green shows maximum parallelism, while red shows no parallelism (i.e. a single command was running at that time).

Finally, each command is assigned a “cost”, \(\chi\). This is defined as:

\[\chi = \sum_{i_\mathrm{start}}^{i_\mathrm{end}} N_i^{-1},\]

where \(i_\mathrm{start}\) and \(i_\mathrm{end}\) are the start and end times of the command, and \(N_i\) is the number of commands running in parallel at time \(i\). Cost is therefore higher for commands which run longer and for commands which are executed with less parallelism. A ranked list of commands, from most to least costly is included in the report. The report also shows the peak memory used by each command, together with a ranking of commands from most to least memory-hungry, to help identify the largest memory consumers in the build.