Introduction
A few years ago, I wrote a blog post about Dockerfile optimization and how you can organize a Docker image’s layers to leverage the build cache and speed up your Docker builds. The state of the art has advanced since I wrote that blog post. BuildKit is now, as of Docker Engine version 23.0, the default build backend. The principles of how to optimally organize the layers of a Docker image are largely the same, but BuildKit introduces, among other things, fundamental improvements in caching. As I did in my previous blog post, I will use an example to demonstrate some of these improvements.
Setup
As before, we will use the official React Tutorial as our demonstration app (although the Tutorial has been updated since my previous blog post). Download the entire React Tutorial SandBox, including package.json
, into a working directory. (Notice that the .js
and .css
files should go into a src
subdirectory, and the index.html
file should go into a public subdirectory.) We will be using yarn
, so run npm install --save-dev yarn
to install it as a development dependency. Then run yarn install
to create the yarn.lock
file.
We will use the same Dockerfile
, though with an updated base image, as last time:
FROM node:20-alpine3.18 COPY package.json /usr/src/tic-tac-toe/package.json COPY yarn.lock /usr/src/tic-tac-toe/yarn.lock WORKDIR /usr/src/tic-tac-toe RUN ["yarn", "install"] COPY . /usr/src/tic-tac-toe RUN ["yarn", "build"]
Importantly, we need to make sure that we do not accidentally COPY
the node_modules
(or even .git), so create a .dockerignore
file:
.git node_modules
Cache Demonstration
BuildKit is the improved backend which succeeds Docker’s legacy builder, and we will assume for the purposes of this demonstration that BuildKit is the default builder.
BuildKit offers a number of improvements to build caching, so to take advantage of caching with BuildKit, the command we run to build the image is a little bit different than before. Run this command:
docker build \ --tag your/docker/registry/here/optimize-dockerfile:latest \ --cache-to type=inline \ --push \ .
This command will build, tag, and push the image to your registry. BuildKit offers a number of different cache backends. By default only the local cache, which is the build cache on the local filesystem, is enabled. There are other cache backends which will allow us to cache our Docker builds across different CI/CD instances. Here, --cache-to type=inline
is a directive to use the “inline” type of cache backend, which is where the build is cached directly in the image itself.
Now, let’s make a small change to our app. In src/App.js
, change:
nextSquares[i] = 'X';
To:
nextSquares[i] = '+';
Before we rebuild the Docker image with our change, let’s simulate the conditions of a CI/CD instance freshly spawned with a cold cache by completely pruning our Docker environment. Run:
docker system prune -a
And enter ‘y’ when prompted.
Now run:
docker build \ --cache-from your/docker/registry/here/optimize-dockerfile:latest \ .
The output will look something like:
=> [internal] load .dockerignore 0.0s => => transferring context: 58B 0.0s => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 266B 0.0s => [internal] load metadata for docker.io/library/node:20-alpine3.18 0.8s => importing cache manifest from your/docker/registry/here/optimize-dockerfile:latest 1.0s => [1/7] FROM docker.io/library/node:20-alpine3.18@sha256:32427bc0620132b2d9e79e405a1b27944d992501a20417a7f4074 0.0s => [internal] load build context 0.0s => => transferring context: 487.06kB 0.0s => [auth] [redacted] 0.0s => CACHED [2/7] COPY package.json /usr/src/tic-tac-toe/package.json 0.0s => CACHED [3/7] COPY yarn.lock /usr/src/tic-tac-toe/yarn.lock 0.0s => CACHED [4/7] WORKDIR /usr/src/tic-tac-toe 0.0s => CACHED [5/7] RUN ["yarn", "install"] 35.7s => => pulling sha256:c926b61bad3b94ae7351bafd0c184c159ebf0643b085f7ef1d47ecdc7316833c 0.3s => => pulling sha256:4b45d679ee3ef9e23a172a5bbd340012ea4515eec334cd86acf96f92ed40717a 0.3s => => pulling sha256:b519384a30f5a620cfb09aa166d6c3ad7bf64b4731435d9e32153731f636a06b 0.8s => => pulling sha256:095f19a54610183ae53834b1c703b05553b5bd920e6a40a38045c205569a51de 0.2s => => pulling sha256:78508e7796da40c6996da8f58a1a4329f5cfcb5c6d33cfb215bdc179b4984cb6 0.2s => => pulling sha256:29c1ccb5ef912ea427f74a6947a3b22bec7418df01459080c5dc13f679550453 0.3s => => pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.2s => => pulling sha256:bc11ece0c87450dfe61c56c616fc0a6413ca59938f31331e5274a7f1630869b7 1.4s => [6/7] COPY . /usr/src/tic-tac-toe 0.9s => [7/7] RUN ["yarn", "build"] 8.4s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:b48644823a4e8fb3627874214949899343d7ec48bbe06e3b0d0f0f22e6bc0147 0.0s
You can see here that Docker loads the cache manifest from the “cache from” image (just the manifest, not the entire build cache) early in the process, and is therefore able to consult the cache manifest and determine that the first five steps of the build are in the build cache. Step 6 is not cached, the change in src/App.js
invalidating the COPY
command.
At this point, Docker finally does download the cached layers from the “cache from” image to use to build the remainder of the image (Steps 6 and 7). Notably, Docker only downloads the layers from the “cache from” image that it needs, not the entire “cache from” image, and we did not need to docker pull
it before building.
Takeaways
Our simple demonstration only scratches the surface of the strategies and considerations for Docker build optimization. Docker offers the “registry” cache backend as a more sophisticated alternative to the “inline” cache backend, as well as other backends that are, as of this writing, experimental.
The “registry” cache backend is suitable for caching multi-stage builds, as you can set the cache mode to “max” and cache all the layers of the image, while keeping this cache separate from the image itself. We have not touched on multi-stage builds, but BuildKit introduces improvements in this area as well.
As your Dockerfiles get more complex, there is certainly a lot of room to get creative in order to keep your build processes optimal!
(Feature photo by frank mckenna on Unsplash)