Containerize Nodejs App

In a recent technical interview, I faced a series of challenges to be completed within a strict 24-hour timeframe. One such task was to containerize a Node.js application, and I’d like to share my approach and reasoning with you.

Note I will be using the terms “containerize” and “dockerize” interchangeably, as well as referring to “Dockerfile” and “Containerfile” interchangeably.

Hello World

The code was initially very simple and you can check it here. I modified the code slightly by adding the Express module to make the process more interesting. The code import the Express module and create an express application. This app starts a server and listens on port 8080 for connections. The app responds with “Hello World!” for requests to the root URL (/) or route.

const express = require('express');
const app = express();
const port = 8080;

app.get('/', (_, res) => {
  res.send('Hello World!');
})

app.listen(port, () => {
  console.log(`Listening on port ${port}`);
})

Here’s the steps to run the application locally:

Create an app directory and switch to it.
run npm init.
Install express as a dependency: npm install express.
In the app directory, create a file named index.js and copy in the code.
Run the app: node index.js.

Directory Structure

The directory structure looks like this:

app
├── index.js
├── node_modules
├── package.json
└── package-lock.json

Initial Dockerfile

I started with a basic Dockerfile (blatantly copied from nodejs website). The containerfile:

uses the node image version 20,
copies all the files from the current directory into the image directory /usr/src/app,
run npm ci which a clean install of of the dependencies,
expose the port 8080 and run the app.

This command npm-ci is similar to npm install, except it’s meant to be used in automated environments such as test platforms, continuous integration, and deployment.

FROM node:20

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies and bundle app source
COPY . .

RUN npm ci 

EXPOSE 8080
CMD [ "node", "index.js" ]

.dockerignore file

A .dockerignore is a configuration file that describes files and directories that you want to exclude when building a Docker image.

.git
.gitignore
README.md
node_modules
npm-debug.log

This will prevent your local modules and debug logs from being copied onto your Docker image and possibly overwriting modules installed within your image.

Seperating modules and app code

There are two main reasons to seperate modules, packages, and source code: caching efficiency and dependency management.

Caching efficiency When building a Docker image, each step in the Dockerfile creates a new layer. These layers are cached by Docker, and if a layer remains unchanged between builds, Docker can reuse it from the cache. By separating the installation of app dependencies (usually managed through package.json) from the actual application code, you can take advantage of Docker’s layer caching. This way, only changes in the application code will result in rebuilding the layers that follow, making the build process more efficient.
Dependency Management Node.js applications often have a lot of dependencies specified in the package.json file. Separating the installation of these dependencies into its own step (npm ci) ensures that dependencies are installed consistently across different builds of the container. This can prevent discrepancies between different versions of the application if the dependencies were installed during the application code copying phase.

  FROM node:20

  # Create app directory
  WORKDIR /usr/src/app

  # Install app dependencies
+ COPY packages*.json .

  RUN npm ci 

+ COPY server.js .

  EXPOSE 8080
  CMD [ "node", "server.js" ]

Run as a non-root user

By default, Docker runs commands inside the container as root which violates the Principle of Least Privilege (PoLP) when superuser permissions are not strictly required. The user node is provided by the image node and it can be invoked with the flag -u node. Alternatively the user node can be specified in the Dockerfile.

  FROM node:20

+ USER node  

  # Create app directory
  WORKDIR /usr/src/app

  # Install app dependencies
  COPY packages*.json .

  RUN npm ci 

+ COPY --chown=node:node server.js .

  EXPOSE 8080
  CMD [ "node", "server.js" ]

Use smaller images

The default Node.js Docker image is based on a Debian-based Linux distribution, for example node:bookworm. There are two other variants: slim and alpine.

The slim images, such as node:bookworm-slim provides a functional NodeJs environment and nothing more. This decreased the image size dramatically - from a close of gigabyte of container image to an image size of a few hundreds MB. Furthmore, it reduces software footprints and hence vulnerabilities.

The alpine variant are based on the Alpine Linux distribution, and it is relatively smaller than the slim variant. The alpine version has a very low vulnerability count compared to the slim variant. However, the alpine version is an unoffical image, and is experimental and may not be consistent.

Deterministic tag should be favoured, that is, instead of node:bookwork-slim or node:alpine, specify the nodejs runtime such as node:20.5.0-bookworm-slim or node:20.4.0-alpine3.17.

+ FROM node:20.4.0-alpine3.17

  USER node  

  # Create app directory
  WORKDIR /usr/src/app

  # Install app dependencies
  COPY packages*.json .

  RUN npm ci 

  COPY --chown=node:node server.js .

  EXPOSE 8080
  CMD [ "node", "server.js" ]

A comparison

Minimize image size

REPOSITORY                 TAG       IMAGE ID       CREATED          SIZE
node-20-alpine            latest    ae76c5b808a2   12 seconds ago   185MB
node-20.5.0-bookworm-slim latest    3e7be562ea13   12 days ago      253 MB
node-20                   latest    3abe3becfb39   14 minutes ago   1.1GB

Tini - short for Tiny Init

Tini (short for “Tiny Init”) is a small, lightweight init system designed specifically for managing processes within Docker containers or other lightweight environments. It addresses a common problem known as the “PID 1 problem” in containerized environments. In Unix-like operating systems, the init process with PID 1 has special responsibilities, including reaping orphaned child processes, handling signals, and managing the overall lifecycle of the system. However, when running containers, using a traditional init system as PID 1 can lead to various issues due to the isolation and process management characteristics of containers.

Here’s how Tini helps in a Docker container:

Signal Propagation and Handling: Containers rely on signals (such as SIGTERM) to gracefully shut down and handle other lifecycle events. Traditional init systems can sometimes interfere with proper signal propagation. Tini acts as a signal proxy, ensuring that signals sent to the container are appropriately delivered to the processes within, preventing unexpected behavior during shutdown or other events.
Process Reaping: When a process within a container exits, it becomes a “zombie” process until its exit status is collected by the parent process (usually the init process). If the init process doesn’t reap these zombie processes, they can accumulate and negatively impact system resources. Tini performs proper process reaping, preventing the accumulation of zombie processes.
Graceful Shutdown: During container shutdown, Tini ensures that all processes within the container are given a chance to clean up properly. It sends the appropriate signals to processes, allowing them to gracefully terminate and release resources.
Resource Efficiency: Tini is designed to be minimal and lightweight, consuming very little memory and overhead. This is crucial in containerized environments where resource utilization is important.
Compatibility: Tini is highly compatible with various container runtimes and orchestration tools. It’s often used as a drop-in replacement for the default init process in container images.

Tini was specifically designed to mitigate the challenges that arise from using traditional init systems within containers.

  FROM node:20.4.0-alpine3.17

+ RUN apk add --no-cache tini

  USER node  

  # Create app directory
  WORKDIR /usr/src/app

  # Install app dependencies
  COPY packages*.json .

  RUN npm ci 

  COPY --chown=node:node server.js .

  EXPOSE 8080
+ ENTRYPOINT ["/sbin/tini", "--"]
  CMD [ "node", "server.js" ]

The ENTRYPOINT specifies a command that will always be executed when the container starts. The CMD specifies arguments that will be fed to the ENTRYPOINT.

Multi-stage build

+ FROM node:20.4.0-alpine3.17 AS base

  RUN apk add --no-cache tini

  USER node  

  # Create app directory
  WORKDIR /usr/src/app

  # Install app dependencies
  COPY packages*.json .

  RUN npm ci 

  COPY --chown=node:node server.js .

+ FROM base AS app
  EXPOSE 8080
  ENTRYPOINT ["/sbin/tini", "--"]
  CMD [ "node", "server.js" ]

With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.

Here’s a better example to illustrate multi-stage build:

# syntax=docker/dockerfile:1
FROM golang:1.21 as build
WORKDIR /src
COPY <<EOF /src/main.go
package main

import "fmt"

func main() {
  fmt.Println("hello, world")
}
EOF
RUN go build -o /bin/hello ./main.go

FROM scratch
COPY --from=build /bin/hello /bin/hello
CMD ["/bin/hello"]

The second FROM instruction starts a new build stage with the scratch image as its base. The COPY --from=build line copies just the built artifact from the previous stage into this new stage.

The end result is a tiny production image with nothing but the binary inside. None of the build tools required to build the application are included in the resulting image.

BuildKit

Lastly build your image with BuildKit. BuildKit is a builder backend and is used by many projects. buildx is a Docker CLI plugin for extended build capabilities with BuildKit.

Check the documetation on how to use the BuildKit builder.

Resources

https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md https://docs.docker.com/build/building/multi-stage/

July 26, 2023