Optimizing Your Docker Builds

Introduction

A few years ago, I wrote a blog post about Dockerfile optimization and how you can organize a Docker image’s layers to leverage the build cache and speed up your Docker builds. The state of the art has advanced since I wrote that blog post. BuildKit is now, as of Docker Engine version 23.0, the default build backend. The principles of how to optimally organize the layers of a Docker image are largely the same, but BuildKit introduces, among other things, fundamental improvements in caching. As I did in my previous blog post, I will use an example to demonstrate some of these improvements.

Setup

As before, we will use the official React Tutorial as our demonstration app (although the Tutorial has been updated since my previous blog post). Download the entire React Tutorial SandBox, including package.json, into a working directory. (Notice that the .js and .css files should go into a src subdirectory, and the index.html file should go into a public subdirectory.) We will be using yarn, so run npm install --save-dev yarn to install it as a development dependency. Then run yarn install to create the yarn.lock file.

We will use the same Dockerfile, though with an updated base image, as last time:

FROM node:20-alpine3.18

COPY package.json /usr/src/tic-tac-toe/package.json
COPY yarn.lock /usr/src/tic-tac-toe/yarn.lock
WORKDIR /usr/src/tic-tac-toe
RUN ["yarn", "install"]
‚Äč
COPY . /usr/src/tic-tac-toe
RUN ["yarn", "build"]

Importantly, we need to make sure that we do not accidentally COPY the node_modules (or even .git), so create a .dockerignore file:

.git
node_modules

Cache Demonstration

BuildKit is the improved backend which succeeds Docker’s legacy builder, and we will assume for the purposes of this demonstration that BuildKit is the default builder.

BuildKit offers a number of improvements to build caching, so to take advantage of caching with BuildKit, the command we run to build the image is a little bit different than before. Run this command:

docker build \
  --tag your/docker/registry/here/optimize-dockerfile:latest \
  --cache-to type=inline \
  --push \
  .

This command will build, tag, and push the image to your registry. BuildKit offers a number of different cache backends. By default only the local cache, which is the build cache on the local filesystem, is enabled. There are other cache backends which will allow us to cache our Docker builds across different CI/CD instances. Here, --cache-to type=inline is a directive to use the “inline” type of cache backend, which is where the build is cached directly in the image itself.

Now, let’s make a small change to our app. In src/App.js, change:

      nextSquares[i] = 'X';

To:

      nextSquares[i] = '+';

Before we rebuild the Docker image with our change, let’s simulate the conditions of a CI/CD instance freshly spawned with a cold cache by completely pruning our Docker environment. Run:

docker system prune -a

And enter ‘y’ when prompted.

Now run:

docker build \
  --cache-from your/docker/registry/here/optimize-dockerfile:latest \
  .

The output will look something like:

 => [internal] load .dockerignore                                                                                 0.0s
 => => transferring context: 58B                                                                                  0.0s
 => [internal] load build definition from Dockerfile                                                              0.0s
 => => transferring dockerfile: 266B                                                                              0.0s
 => [internal] load metadata for docker.io/library/node:20-alpine3.18                                             0.8s
 => importing cache manifest from your/docker/registry/here/optimize-dockerfile:latest                                  1.0s
 => [1/7] FROM docker.io/library/node:20-alpine3.18@sha256:32427bc0620132b2d9e79e405a1b27944d992501a20417a7f4074  0.0s
 => [internal] load build context                                                                                 0.0s
 => => transferring context: 487.06kB                                                                             0.0s
 => [auth] [redacted]                                                                                             0.0s
 => CACHED [2/7] COPY package.json /usr/src/tic-tac-toe/package.json                                              0.0s
 => CACHED [3/7] COPY yarn.lock /usr/src/tic-tac-toe/yarn.lock                                                    0.0s
 => CACHED [4/7] WORKDIR /usr/src/tic-tac-toe                                                                     0.0s
 => CACHED [5/7] RUN ["yarn", "install"]                                                                         35.7s
 => => pulling sha256:c926b61bad3b94ae7351bafd0c184c159ebf0643b085f7ef1d47ecdc7316833c                            0.3s
 => => pulling sha256:4b45d679ee3ef9e23a172a5bbd340012ea4515eec334cd86acf96f92ed40717a                            0.3s
 => => pulling sha256:b519384a30f5a620cfb09aa166d6c3ad7bf64b4731435d9e32153731f636a06b                            0.8s
 => => pulling sha256:095f19a54610183ae53834b1c703b05553b5bd920e6a40a38045c205569a51de                            0.2s
 => => pulling sha256:78508e7796da40c6996da8f58a1a4329f5cfcb5c6d33cfb215bdc179b4984cb6                            0.2s
 => => pulling sha256:29c1ccb5ef912ea427f74a6947a3b22bec7418df01459080c5dc13f679550453                            0.3s
 => => pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1                            0.2s
 => => pulling sha256:bc11ece0c87450dfe61c56c616fc0a6413ca59938f31331e5274a7f1630869b7                            1.4s
 => [6/7] COPY . /usr/src/tic-tac-toe                                                                             0.9s
 => [7/7] RUN ["yarn", "build"]                                                                                   8.4s
 => exporting to image                                                                                            0.0s
 => => exporting layers                                                                                           0.0s
 => => writing image sha256:b48644823a4e8fb3627874214949899343d7ec48bbe06e3b0d0f0f22e6bc0147                      0.0s

You can see here that Docker loads the cache manifest from the “cache from” image (just the manifest, not the entire build cache) early in the process, and is therefore able to consult the cache manifest and determine that the first five steps of the build are in the build cache. Step 6 is not cached, the change in src/App.js invalidating the COPY command.

At this point, Docker finally does download the cached layers from the “cache from” image to use to build the remainder of the image (Steps 6 and 7). Notably, Docker only downloads the layers from the “cache from” image that it needs, not the entire “cache from” image, and we did not need to docker pull it before building.

Takeaways

Our simple demonstration only scratches the surface of the strategies and considerations for Docker build optimization. Docker offers the “registry” cache backend as a more sophisticated alternative to the “inline” cache backend, as well as other backends that are, as of this writing, experimental.

The “registry” cache backend is suitable for caching multi-stage builds, as you can set the cache mode to “max” and cache all the layers of the image, while keeping this cache separate from the image itself. We have not touched on multi-stage builds, but BuildKit introduces improvements in this area as well.

As your Dockerfiles get more complex, there is certainly a lot of room to get creative in order to keep your build processes optimal!

(Feature photo by frank mckenna on Unsplash)

Posted By