The official Dockerfile best practices have lots of great content on how to improve your Dockerfiles.
You should primarily optimize for performance (especially for test runners). This will ensure your tooling runs as fast as possible and does not time-out.
Measuring execution time often is a great way to get a feel for the performance of tooling. Make it a habit to measure execution time both after and before a change. Even when you feel "certain" that a change will improve performance, you should still measure execution time.
When possible, create scripts to automatically measure performance (also known as benchmarking). A very helpful command-line tool is hyperfine, but feel free to use whatever makes the most sense for your tooling.
Newer track tooling repos will have access to the following two scripts:
./bin/benchmark.sh
: benchmark the track tooling code (source code)./bin/benchmark-in-docker.sh
: benchmark the track tooling Docker image (source code)If you're working on a track tooling repo without these files, feel free to copy them into your repo using the above source links.
Benchmarking scripts can help estimate the tooling's performance. Bear in mind though that the performance on Exercism's production servers is often lower.
Try experimenting with different base images (e.g. Alpine instead of Ubuntu), to see if one (significantly) outperforms the other. If performance is relatively equal, go for the image that is smallest.
Check if using the internal
network instead of none
improves performance.
See the network docs for more information.
The track tooling runs a one-off, short-lived Docker container which executes the following steps.
Therefore, code that runs in step 2 runs for every single tooling run. For this reason, reducing the amount of code that runs in step 2 is a great way to improve performance. One way of doing this is to move code from run-time to build-time. Whilst run-time code runs on every single tooling run, build-time code only runs once (when the Docker image is built).
Build-time code runs once as part of a GitHub Actions workflow. Therefore, it's fine if the code that runs at build-time is (relatively) slow.
When running tests in the Haskell test runner, it requires some base libraries to be compiled. As each test run happens in a fresh container, this means that this compilation was done in every single test run! To circumvent this, the Haskell test runner's Dockerfile has the following two commands:
COPY pre-compiled/ .
RUN stack build --resolver lts-20.18 --no-terminal --test --no-run-tests
First, the pre-compiled
directory is copied into the image.
This directory is set up as a test exercise and depends on the same base libraries that the actual exercise depends on.
Then we run the tests on that directory, which is similar to how tests are run for an actual exercise.
Running the tests will result in the base being compiled, but the difference is that this happens at build time.
The resulting Docker image will thus have its base libraries already compiled.
This means compiling is not needed at run time, resulting in a (much) faster execution.
Some languages allow code to be compiled ahead-of-time or just-in-time. This is a build time vs. run time tradeoff, and again, we favor build time execution for performance reasons.
The C# test runner's Dockerfile uses this approach, where the test runner is compiled to a binary ahead-of-time (at build time) instead of just-in-time compiling the code (at run time). This means that there is less work to do at run-time, which should help increase performance.
You should try to reduce the image's size, which means that it'll:
Different distribution images will have different sizes.
For example, the alpine:3.20.2
image is ten times smaller than the ubuntu:24.10
image:
REPOSITORY TAG SIZE
alpine 3.20.2 8.83MB
ubuntu 24.10 101MB
In general, Alpine-based images are amongst the smallest images, so many tooling images are based on Alpine.
Some images have special "slim" variants, in which some features will have been removed resulting in smaller image sizes.
For example, the node:20.16.0-slim
image is five times smaller than the node:20.16.0
image:
REPOSITORY TAG SIZE
node 20.16.0 1.09GB
node 20.16.0-slim 219MB
The reason "slim" variants are smaller is that they have fewer features. Your image might not need the additional features, and if not, consider using the "slim" variant.
An obvious, but great, way to reduce the size of your image is to remove anything you don't need. These can include things like:
Most Docker images need to install additional packages, which is usually done via a package manager. These packages must be installed at build time (as no internet connection is available at run time). Therefore, any package manager caching/bookkeeping files should be removed after installing the additional packages.
Distributions that use the apk
package manager (such as Alpine) should use the --no-cache
flag when using apk add
to install packages:
RUN apk add --no-cache curl
Distributions that use the apt-get
/apk
package manager (such as Ubuntu) should run the apt-get autoremove -y
and rm -rf /var/lib/apt/lists/*
commands after installing the packages and in the same RUN
command:
RUN apt-get update && \
apt-get install curl -y && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
Docker has a feature called multi-stage builds. These allow you to partition your Dockerfile into separate stages, with only the last stage ending up in the produced Docker image (the rest is only there to support building the last stage). You can think of each stage as its own mini Dockerfile; stages can use different base images.
Multi-stage builds are particularly useful when your Dockerfile requires packages to be installed that are only needed at build time. In this situation, the general structure of your Dockerfile looks like this:
With this setup, the additional packages are only installed in the "build" stage and not in the "runtime" stage, which means that they won't end up in the Docker image that is produced.
The Fortran test runner requires curl
to download some files.
However, its run time image does not need curl
, which makes this a perfect use case for a multi-stage build.
First, its Dockerfile defines a stage (named "build") in which the curl
package is installed.
It then uses curl to download files into that stage.
FROM alpine:3.15 AS build
RUN apk add --no-cache curl
WORKDIR /opt/test-runner
COPY bust_cache .
WORKDIR /opt/test-runner/testlib
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/testlib/CMakeLists.txt
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/testlib/TesterMain.f90
WORKDIR /opt/test-runner
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/config/CMakeLists.txt
The second part of the Dockerfile defines a new stage and copies the downloaded files from the "build" stage into its own stage using the COPY
command:
FROM alpine:3.15
RUN apk add --no-cache coreutils jq gfortran libc-dev cmake make
WORKDIR /opt/test-runner
COPY --from=build /opt/test-runner/ .
COPY . .
ENTRYPOINT ["/opt/test-runner/bin/run.sh"]
The Ruby test runner needs the git
, openssh
, build-base
, gcc
and wget
packages to be installed before its required libraries (gems) can be installed.
Its Dockerfile starts with a stage (given the name build
) that installs those packages (via apk add
) and then installs the dependencies (via bundle install
):
FROM ruby:3.2.2-alpine3.18 AS build
RUN apk update && apk upgrade && \
apk add --no-cache git openssh build-base gcc wget git
COPY Gemfile Gemfile.lock .
RUN gem install bundler:2.4.18 && \
bundle config set without 'development test' && \
bundle install
It then defines the stage that will form the resulting Docker image.
This stage does not install the dependencies the previous stage installed, instead it uses the COPY
command to copy the installed libraries from the build stage into its own stage:
FROM ruby:3.2.2-alpine3.18
RUN apk add --no-cache bash
WORKDIR /opt/test-runner
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY . .
ENTRYPOINT [ "sh", "/opt/test-runner/bin/run.sh" ]
The C# test runner's Dockerfile does something similar, only in this case the build stage can use an existing Docker image that has pre-installed the additional packages required to install libraries.
Unit tests can be very useful, but we recommend focusing on writing integration tests. Their main benefit is that they better test how tooling runs in production, and thus help increase confidence in your tooling's implementation.
To best mimic the production environment, the integration tests should run the tooling like the production environment. This means building the Docker image and then running the built image on a solution to verify its output.
Integration tests should be defined as golden tests, which are tests where the expected output is stored in a file. This is perfect for track-tooling integration tests, as the output of tooling are also files.
When running the test runner on a solution, its output is a results.json
file.
We can then compare this file against a "known good" (i.e. "expected") output file (named expected_results.json
) to check if the test runner works as intended.
Safety is a main reason why we're using Docker containers to run our tooling.
There are many Docker images on Docker Hub, but try to use official ones. These images are curated and have (far) less chance of being unsafe.
To ensure that builds are stable (i.e. they don't suddenly break), you should always pin your base images to specific tags. That means instead of:
FROM alpine:latest
you should use:
FROM alpine:3.20.2
With the latter, builds will always use the same version.
By default, many images will run with a user that has root privileges. You should consider running as a non-privileged user.
FROM alpine
RUN groupadd -r myuser && useradd -r -g myuser myuser
# RUN <COMMANDS THAT REQUIRE ROOT USER, E.G. INSTALLING PACKAGES>
USER myuser
It is (almost) always a good idea to install the latest versions
RUN apt-get update && \
apt-get install curl
We encourage Docker files to be written using a read-only filesystem. The only directories you should assume to be writeable are:
/tmp
dirOur production environment currently does not enforce a read-only filesystem, but we might in the future. For this reason, the base template for a new test runner/analyzer/representer starts out with a read-only filesystem. If you can't get things working on a read-only file, feel free to (for now) assume a writeable file system.