Apptainer development environment

A short tutorial on using Apptainer to create a portable development environment.

Author
Affiliation
Published

April 7, 2023

A typical scenario when you do computer-science experiments is the following: you are working on some code on your laptop, where you run some small pilot experiments, and then you move to a bigger and more powerful machine to schedule a larger batch of experiments1. Usually you have to recompile your code on the powerful machine, but you might not have the same toolchain that you have on your local machine. Most likely you even lack the permissions to install the software that you need.

A related scenario involves the passing of time. You work on a project and achieve some goals. Then you work on another project, after updating your operating system. One year later you have to revisit the first project, maybe for a journal revision.

BOOM.

Your software no longer compiles, let alone run2.

Containers to the rescue

If only there was a way to “freeze” the environment in which your software is built and runs…

Turns out that there is a way to do exactly that, and there’s been for quite some time now3: containers.

A container can be thought of as a way to pack together executable code and its dependencies. There are several ways to accomplish this, most notably Docker, Podman, and Apptainer4.

In essence, a container technology allows you to specify the characteristics of your environment in a text file (which can therefore be checked in version control). This text file will then be “compiled” into a container image, an entity which can then be executed to give you access to the functionality provided by the container.

More than that, the container is relocatable: you can transfer it to another machine and use it to run your software, without having to recompile it first. It works even if the host lacks the dependencies you need: you are carrying them along inside the container!

This is great for deploying applications, but is great for developing them as well. Daniel Lemire did show a couple of years ago how to build a Docker programming station.

In this post we are going to achieve something similar using Apptainer, which is more likely to be found on HPC systems managed by SLURM.

For the sake of example we are going to work with the following C++ program, which simply builds a JSON object with a couple of fields.

main.cpp
#include <iostream>
#include <nlohmann/json.hpp>
#include <ctime>

using json = nlohmann::json;

int main(int argc, char **argv) {
    if (argc != 2) {
        std::cerr << "USAGE: myapp WHO" << std::endl;
        return 1;
    }
    std::string who(argv[1]);
    std::time_t when = std::time(nullptr);
    json jobj = {
        {"who", who},
        {"where", "container"},
        {"when", std::asctime(std::localtime(&when))}
    };
    std::cout << jobj << std::endl;
    return 0;
}

Importantly, the code makes use of the json for modern C++ library, which we are going to integrate in our container.

To compile the code we use the following simple Makefile:

Makefile
myapp: main.cpp
	g++ -O2 -o $@ $^

Setting up a development container

To define an environment with Apptainer you write a definition file, which in our case looks like the following.

devenv.def
# This line means "pick the base container image from 
# the docker hub".
Bootstrap: docker
# Whereas here we specify the particular image we are 
# interested in using as the base image, in this case 
# a basic `ubuntu` system at version `jammy-20230308`.
# The base image is the operating system configuration
# that you want to customize.
From: ubuntu:jammy-20230308

# A definition file has several sections, see the documentation.
# In the `post` section you can run commands to customize 
# your environment
%post
    # This is the place where you can 
    # install additional dependencies.
    apt-get update
    apt-get install -y cmake build-essential g++ curl

    # Install Json for modern C++ from Github
    mkdir /usr/local/include/nlohmann
    cd /usr/local/include/nlohmann
    curl -O https://raw.githubusercontent.com/nlohmann/json/develop/single_include/nlohmann/json.hpp

This container is based on the Ubuntu image. There are several other images available at the Docker hub.

The %post section of a definition file is basically a shell script that configures the container.

From the above definition file we can build the container image using the following command.

Container definitions files have the .def extension, images have the .sif extension in Apptainer.
❯ apptainer build devenv.sif devenv.def

Running the above command takes a while (like one minute and a half) but you only have to run it once to create your development environment. The result is a file in the directory in which you issued the command. This is in contrast with Docker, that manages the image files for you in a less transparent way.

Then, you can interact with the container as follows:

❯ apptainer exec devenv.sif

or simply5

❯ ./devenv.sif

will drop you in a shell running inside the container, where you have all the software you asked for in your definition file.

The cool part is that Apptainer will make the current working directory available in the container, and it is writable as well6!

Apptainer> ls
Makefile  main.cpp

Therefore, if you run make inside the container you will compile the software using the toolchain installed in the container, and the resulting binary will be available on the host as well! You can thus use this development environment to happily hack on your local machine.

Do you want to start another project? You can create another development environment in the same way and they will not interfere with each other.

Containerize the software for deployment elsewere

Once you want to deploy your code to the bigger machine to run your batch of experiments, you want to pack your application’s binary in a container as well.

To this end, we can use the very same approach as before, but now we can start from our own customized devenv.sif image rather than from scratch!

Here is the image specification file. It uses the multi stage builds of apptainer to shrink the final image size. This is optional but recommended for parsimony, since it allows to drop the compiler toolchain from the final image.

deploy.def
# Here we specify that we are going to use as a base image
# the `devenv.sif` file that we created previously
Bootstrap: localimage
From: devenv.sif
# And here are are explicitly naming the step
Stage: build

# This section states which files from the host we want to copy
# in the container image, and where.
%files
    ./main.cpp   /usr/local/src
    ./Makefile   /usr/local/src

%post
    # Go to the directory containing the source files
    cd /usr/local/src
    # Build the software
    make

# Now we start again from a base image
Bootstrap: docker
From: ubuntu:jammy-20230308
# We explicitly name this stage as well
Stage: deploy

# We can copy files from one stage to the other
%files from build
    # We use this funcionality to bring the compiled binary 
    # in this newly created container image.
    /usr/local/src/myapp  /usr/bin

As before, we can build the container image with the following command:

  apptainer build deploy.sif deploy.def

Now you can copy the deploy.sif image to the remote system, and run your software as follows:

❯ apptainer exec deploy.sif myapp Matteo
{"when":"Thu Apr  6 13:13:29 2023\n","where":"container","who":"Matteo"}

or more concisely

❯ ./deploy.sif myapp Matteo
{"when":"Thu Apr  6 13:13:29 2023\n","where":"container","who":"Matteo"}

Notice that to pass whatever argument to your program you just give it on the command line following the name of the executable, as usual.

And that’s it!

Bonus: other languages

So far we have seen how to set up a development environment for C++. Now we are going to see how to do it for a couple other languages (eventually).

Python

Managing Python dependencies is notoriously a nightmare, especially when multiple installations are involved. We can use containers (instead of virtual environments) to try and address this issue.

The following is an Apptainer definition file that sets up a Python environment with some packages installed using [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html), a tiny version of the mamba Python package manager, which is itself a faster implementation of conda.

As you can see, it is slightly more complicated than our earlier example, but the basic concept is the same.

First, we create a file env.yml containing a conda specification of the dependencies

env.yml
name: base
channels:
  - conda-forge
dependencies:
  - numpy==1.25.0
  - pandas==2.0.2
  - seaborn==0.12.2
  - scikit-learn==1.2.2
  - h5py==3.9.0

Then we include this file in the container, and use it to install the python libraries we need.

python.def
Bootstrap: docker
From: ubuntu:jammy-20230308

%files
  env.yml /env.yml

%post
  # Install minimal dependencies
  apt-get update
  apt-get install -y curl bzip2

  # Create environment for micromamba
  mkdir -p /opt/env/micromamba

  # Download micromamba
  curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/1.3.0 | tar -xvj bin/micromamba

  # Setup the base environment
  export MAMBA_ROOT_PREFIX=/opt/env/micromamba
  eval "$(micromamba shell hook -s posix)"
  micromamba activate

  # Install the packages we want, from the env.yml file
  micromamba install -y -q -f /env.yml

  micromamba clean --all --yes

%environment
  # Environment available at runtime, enriched by the 
  # scripts that set up micromamba
  export MAMBA_ROOT_PREFIX=/opt/env/micromamba
  eval "$(micromamba shell hook -s posix)"
  micromamba activate

# The `runscript` section allows to specify the command to be 
# executed by default when the container is run
%runscript
  python

Executing the container will drop you in a Python shell, with all the packages you asked for:

❯ ./python.sif                                                                                   (base)
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.25.0'
>>>

Comments

If you have comments, corrections, or suggestions, please drop me a line at matteo.ceccarello@unipd.it.

Acknowledgements

Many thanks to Ilie Sarpe for proofreading this post.

Reuse