Docker Hub Tips and Tricks

Docker Hub is the default container registry used by your local Docker client. It hosts all kinds of container images, ranging from official ones like Debian, to unofficial stuff anyone can build. In this blogpost, I'll explain what to look out for while choosing an image to use, and will show some tricks I learned while developing our own images.

Choosing the right image

When choosing an image I look for a couple of things; first I check if I'm dealing with an official image or not. Official images are maintained by the same community as the piece of software you're trying to run, and hence I have a little more trust in those images. Second, I look at the number of pulls of this image. Eg the debian image I reference earlier has more than 10 million pulls, and hence I expect it to be more stable. Third, look at the last updated timestamp of the images. If this is a couple months ago, then the images are built on top of an older version of the base images, and might be missing out on a lot of security patches. Fourth, I check if I'm dealing with an image which has an automated build attached. An example of such an image is Home Assistant. Having an automated build typically results in more frequent updates, and therefore less security vulnerabilities. And finally, have a look at the size of the image. Sometimes, it's a problem that larger images result in slower deployments, sometimes it's not.

Creating our own Miniconda image

At clients we often build containers based on Miniconda. Miniconda is developed by Continuum, and bundles the conda package manager and a Python version. A great combo to base any DataScience container on. Continuum has its own Miniconda image, but there were some things I didn't quite like about that image. The images are quite old, the most recent image being updated over 2 months ago. Older releases never seem to be updated, after their initial push. Moreover, the tags Continuum provides don't allow you to specify a Python version, only which miniconda release to use. Finally, looking at the Dockerfile, some things could be optimized a bit to reduce the size of the final image.

So what did we improve?

Dockerfile

First, we modified the Dockerfile:

FROM debian:stretch-slim

ARG BUILD_DATE
ARG MINICONDA_VERSION=3
ARG MINICONDA_RELEASE=latest
ARG PYTHON_VERSION

LABEL org.label-schema.name="Continuum Miniconda $PYTHON_VERSION" \
      org.label-schema.build-date=$BUILD_DATE \
      org.label-schema.version=$MINICONDA_VERSION-$MINICONDA_RELEASE

ENV PATH="/opt/miniconda${MINICONDA_VERSION}/bin:${PATH}"

RUN set -x && \
    apt-get update && \
    apt-get install -y curl bzip2 && \
    curl -s --url "https://repo.continuum.io/miniconda/Miniconda${MINICONDA_VERSION}-${MINICONDA_RELEASE}-Linux-x86_64.sh" --output /tmp/miniconda.sh && \
    bash /tmp/miniconda.sh -b -f -p "/opt/miniconda${MINICONDA_VERSION}" && \
    rm /tmp/miniconda.sh && \
    apt-get purge -y --auto-remove curl bzip2 && \
    apt-get clean && \
    conda config --set auto_update_conda true && \
    if [ "$MINICONDA_VERSION" = "2" ]; then\
        conda install -y futures;\
    fi && \
    if [ "$MINICONDA_RELEASE" = "latest" ]; then\
        conda update conda -y --force;\
    fi && \
    if [ -n "$PYTHON_VERSION" ]; then\
        conda install python=$PYTHON_VERSION -y --force;\
    fi && \
    conda clean -tipsy && \
    find /opt/miniconda${MINICONDA_VERSION} -depth \( \( -type d -a \( -name test -o -name tests \) \) -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \) | xargs rm -rf && \
    echo "PATH=/opt/miniconda${MINICONDA_VERSION}/bin:\${PATH}" > /etc/profile.d/miniconda.sh

ENTRYPOINT ["conda"]
CMD ["--help"]

Few things to note, we fixed the debian base image to a release. Eg, we didn't use the latest tag. Next, we include a couple build args to be able to modify which miniconda release will be downloaded/installed. We specified a label to expose which version/release this image bundles, and finally have a single RUN statement to reduce the overall size of the image.

Automated build

Second, we linked our Github repo to our Docker Hub, to make use of automated builds. Docker Hub will ping Github and automatically schedule a rebuild of the images if it detects a change to the master branch. Moreover, because we enabled a rebuild if the base image changes, we automatically include security patches into our images.

After detecting a change, Docker Hub will use the build rules you specify to kick off the build process. For our Miniconda image we've configured 4 build rules with the following Docker tags:

  • latest-2.7,2.7,2
  • 4.5.11-3.5,3.5
  • latest-3.6,3.6,3,latest
  • latest-3.7,3.7

Each build rule results in a build, and because we include a modified build hook (shown in the next step) in our repo we can use the tags to influence the build step itself.

Build hook

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash

IMAGE_TAG=${IMAGE_NAME#*:}

MINICONDA_RELEASE=${IMAGE_TAG%-*}
PYTHON_VERSION=${IMAGE_TAG#*-}
MINICONDA_VERSION=${PYTHON_VERSION:0:1}

# Trigger build
docker build --build-arg BUILD_DATE=`date -u +"%Y-%m-%dT%H:%M:%SZ"` \
             --build-arg MINICONDA_VERSION=$MINICONDA_VERSION \
             --build-arg MINICONDA_RELEASE=$MINICONDA_RELEASE \
             --build-arg PYTHON_VERSION=$PYTHON_VERSION \
             -t $DOCKER_REPO:${DOCKER_TAG//,/ -t $DOCKER_REPO:} .

By placing a hooks/build script in your git repo, Docker Hub will use that script to build your image. In our case, we use it manipulate the $IMAGE_NAME and extract the build-arguments from it. $IMAGE_NAME is primary name of the image (eg index.docker.io/godatadriven/miniconda:latest-2.7), $DOCKER_REPO the repo part (index.docker.io/godatadriven/miniconda) and $DOCKER_TAG all tags (latest-2.7,2.7,2). We use bash to split the primary tag of an image into its components, and pass those to the Docker build process. E.g. latest-2.7 results in $MINICONDA_RELEASE==latest, $MINICONDA_VERSION==2, and $PYTHON_VERSION==2.7.

The last line in the Docker build makes sure to tag this image with each individual tag in the Docker build rule. E.g. in the example above, it will tag this image with godatadriven/miniconda:latest-2.7, godatadriven/miniconda:2.7, and godatadriven/miniconda:2.

Concluding

I feel that the combination of a Dockerfile with build-args, the automated builds with build rules, and the custom build hook gives us a lot of flexibility wrt being able to specify which/how to build images. Moreover, because we rebuild whenever the base images changes, we automatically keep in sync with security patches (in this case debian:stretch-slim).

Next time, I'll show how to extend this approach and include some testing, to make sure that the images we build actually run.

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.
Follow us for more of this