Docker

What is Docker? Docker is a tool to make it easier to create, deploy and run applications by using containers. Containers allow developers to package up an application with all of the dependencies such as libraries and tools, and deploy it as one package. The application will run on most operating systems (Mac/Windows/Linux) regardless of any customized settings. This page covers how you can run a development environments using Docker containers and package up your own code into a portable container.

Warning

This tutorial is meant to be run on your personal computer, not the HPCC. Docker does not work on the HPCC since it requires super user (sudo) permissions that users do not have access to. To run containers on the HPCC, you will need to use Singularity.

Nevertheless, many of the skills taught in this tutorial transfer over to using Singularity, and most Docker containers can be used without modification on the HPCC through Singularity. However, if you just want to get started running containers on the HPCC, start with the Singularity Introduction.

Docker installation

Docker can be installed on all major operating systems. However, note that installation on Windows requires the Windows Subsystem For Linux (WSL).

For detailed installation instructions depending on operating system, click here: Mac/Windows/Linux.

Testing Docker installation

When you have installed Docker, test your Docker installation by opening a terminal (if you are running Windows, this should be a WSL terminal) and running the following command:

$ docker --version
Docker version 19.03.8, build afacb8b

When you run the docker command without --version, you will see the options available with docker. Alternatively, you can test your installation by running the following (you have to log into Docker to use this test):

$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
0e03bdcc26d7: Pull complete
Digest: sha256:6a65f928fb91fcfbc963f7aa6d57c8eeb426ad9a20c7ee045538ef34847f44f1
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/

Running Docker containers from prebuilt images

Now, you have setup everything, and it is time to use Docker seriously. You will run a container from the Alpine Linux image on your system and will learn the docker run command. However, you should first know what containers and images are, and the difference between containers and images.

Images: The file system and configuration of applications which are created and distributed by developers. Of course, you can create and distribute images.

Containers: Running instances of Docker images. You can have many containers for the same image.

Now that you know what containers and images are, let's get some practice, by running the docker run alpine ls -l command in your terminal.

$ docker run alpine ls -l
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
cbdbe7a5bc2a: Pull complete
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
total 56
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 bin
drwxr-xr-x    5 root     root           340 May 26 17:11 dev
drwxr-xr-x    1 root     root          4096 May 26 17:11 etc
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 home
drwxr-xr-x    5 root     root          4096 Apr 23 06:25 lib
drwxr-xr-x    5 root     root          4096 Apr 23 06:25 media
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 mnt
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 opt
dr-xr-xr-x  187 root     root             0 May 26 17:11 proc
drwx------    2 root     root          4096 Apr 23 06:25 root
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 run
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 sbin
drwxr-xr-x    2 root     root          4096 Apr 23 06:25 srv
dr-xr-xr-x   12 root     root             0 May 26 17:11 sys
drwxrwxrwt    2 root     root          4096 Apr 23 06:25 tmp
drwxr-xr-x    7 root     root          4096 Apr 23 06:25 usr
drwxr-xr-x   12 root     root          4096 Apr 23 06:25 var

When you run the docker run alpine ls -l command, it searches for the alpine:latest image from your system first. If your system has it (i.e. if you downloaded it previously), Docker uses that image.

If your system does not have that image, then Docker fetches the alpine:latest image from Docker Hub first, saves it in your system, then runs a container from the saved image. Docker Hub is a huge repository of images people have uploaded so that others can download and run their code in containers. Though Docker Hub is the most popular place to find Docker images, there are other sources that work just as well (for example, Quay.io).

docker run alpine starts a container, and then ls -l will be a command which is fed to the container, so Docker starts the given command and results show up.

To see a list of all images on your system, you can use the docker images command.

$ docker images
alpine                     latest              f70734b6a266        4 weeks ago         5.61MB
hello-world                latest              bf756fb1ae65        4 months ago        13.3kB

Next, let's try another command.

$ docker run alpine echo "Hello world"
Hello world

In this case, Docker ran the echo command in your alpine container, and then exited it. Exit means the container is terminated after running the command.

Let's try another command.

docker run alpine sh

It seems nothing happened. In fact, docker ran the sh command in your alpine container, and exited it. If you want to be inside the container shell, you need to use docker run -it alpine sh. The -i flag tells Docker you want to run the container interactively and the -t flag tells it you want to start a terminal in that image to run your command. You can find more help on the run command with docker run --help.

Let's run a few commands inside the docker run -it alpine sh container.

$ docker run -it alpine sh
/ # ls
bin    etc    lib    mnt    proc   run    srv    tmp    var
dev    home   media  opt    root   sbin   sys    usr
/ # uname -a
Linux c1552c9b6cf0 4.19.76-linuxkit #1 SMP Fri Apr 3 15:53:26 UTC 2020 x86_64 Linux
/ # exit

You are inside of the container shell and you can try out a few commands like ls and uname -a and others. To quit the container, type exit on the terminal. If you use the exit command, the container is terminated. If you want to keep the container active, then you can use keys Ctrl-p followed by Ctrl-q (you don't have to press these key combinations simultaneously). If you want to go back into the container, you can type docker attach <container_id>, such as docker attach c1552c9b6cf0. You can find container id with docker ps -all. This command will be explained next.

Now, let's learn about the docker ps command which shows you all containers that are currently running.

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS

In this case, you don't see any container because no containers are running. To see a list of all containers that you ran, use docker ps --all. You can see that STATUS says that all containers exited.

$ docker ps --all
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                      PORTS               NAMES
c1552c9b6cf0        alpine              "sh"                     6 minutes ago       Exited (0) 2 minutes ago                        wonderful_cori
5de22ab86f2a        alpine              "echo 'Hello world'"     18 minutes ago      Exited (0) 18 minutes ago                       goofy_visvesvaraya
df35ee7df7e3        alpine              "ls -l"                  31 minutes ago      Exited (0) 31 minutes ago                       fervent_gould
6dbe999044b4        hello-world         "/hello"                 3 hours ago         Exited (0) 3 hours ago

When Docker containers are created, the Docker system automatically assign a universally unique identifier (UUID) number to each container to avoid any naming conflicts. CONTAINER ID is a shortform of the UUID. Each container also has a randomly generated name. You can usually use this name in place of the CONTAINER ID to make typing a bit easier.

You can also assign names to your Docker containers when you run them, using the --name flags. In addition, you can rename your Docker container's name with rename command. For example, let's rename "wonderful_cori" to "my_container" with docker rename command.

$ docker rename wonderful_cori my_container
$ docker ps --all
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                      PORTS               NAMES
c1552c9b6cf0        alpine              "sh"                     10 minutes ago      Exited (0) 6 minutes ago                        my_container
5de22ab86f2a        alpine              "echo 'Hello world'"     22 minutes ago      Exited (0) 22 minutes ago                       goofy_visvesvaraya
df35ee7df7e3        alpine              "ls -l"                  35 minutes ago      Exited (0) 35 minutes ago                       fervent_gould

Build Docker images which contain your own code

Now you are ready to use Docker to create your own applications! First, you will learn more about Docker images. Then you will build your own image and use that image to run an application on your local machine.

Docker images

Docker images are basis of containers. In the above example, you pulled the alpine image from Docker Hub and ran a container based on that image. To see the list of images that are available on your local machine, run the docker images command.

$ docker images
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
alpine        latest    9ed4aefc74f6   4 weeks ago     7.05MB
hello-world   latest    feb5d9fea6a5   19 months ago   13.3kB

The TAG refers to a particular snapshot of the image and the ID is the corresponding UUID of the image. Images can have multiple versions. When you do not assign a specific version number, the client defaults to latest. If you want a specific version of the image, you can use docker pull command as follows:

$ docker pull ubuntu:22.04
7: Pulling from library/ubuntu
2d473b07cdd5: Pull complete 
Digest: sha256:0eb0f877e1c869a300c442c41120e778db7161419244ee5cbc6fa5f134e74736
Status: Downloaded newer image for ubuntu:22.04
docker.io/library/ubuntu:22.04

Notice that here we pulled the image without running it. When we run a container with the ubuntu:22.04 image in the future, it will use this downloaded copy.

You can search for images from a repository's website (for example, searching Docker hub for CentOS) or directly from the command line using docker search.

$ docker search centos
NAME                               DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED
centos                             The official build of CentOS.                   6014                [OK]
ansible/centos7-ansible            Ansible on Centos7                              129                                     [OK]
consol/centos-xfce-vnc             Centos container with "headless" VNC session…   115                                     [OK]
jdeathe/centos-ssh                 OpenSSH / Supervisor / EPEL/IUS/SCL Repos - …   114                                     [OK]
centos/mysql-57-centos7            MySQL 5.7 SQL database server                   76
imagine10255/centos6-lnmp-php56    centos6-lnmp-php56                              58                                      [OK]
tutum/centos                       Simple CentOS docker image with SSH access      46
centos/postgresql-96-centos7       PostgreSQL is an advanced Object-Relational …   44
kinogmt/centos-ssh                 CentOS with SSH                                 29                                      [OK]
pivotaldata/centos-gpdb-dev        CentOS image for GPDB development. Tag names…   12
guyton/centos6                     From official centos6 container with full up…   10                                      [OK]
centos/tools                       Docker image that has systems administration…   6                                       [OK]
drecom/centos-ruby                 centos ruby                                     6                                       [OK]
pivotaldata/centos                 Base centos, freshened up a little with a Do…   4
pivotaldata/centos-mingw           Using the mingw toolchain to cross-compile t…   3
darksheer/centos                   Base Centos Image -- Updated hourly             3                                       [OK]
mamohr/centos-java                 Oracle Java 8 Docker image based on Centos 7    3                                       [OK]
pivotaldata/centos-gcc-toolchain   CentOS with a toolchain, but unaffiliated wi…   3
miko2u/centos6                     CentOS6 日本語環境                                   2                                       [OK]
blacklabelops/centos               CentOS Base Image! Built and Updates Daily!     1                                       [OK]
indigo/centos-maven                Vanilla CentOS 7 with Oracle Java Developmen…   1                                       [OK]
mcnaughton/centos-base             centos base image                               1                                       [OK]
pivotaldata/centos7-dev            CentosOS 7 image for GPDB development           0
smartentry/centos                  centos with smartentry                          0                                       [OK]
pivotaldata/centos6.8-dev          CentosOS 6.8 image for GPDB development         0

Building your first Docker image

In this section, you will build a simple Docker image with writing a Dockerfile, and run it. For this purpose, we will create a Python script and a Dockerfile.

Creating working directory

Let's create a working directory where you will make the following files: hello.py, Dockerfile.

cd ~
mkdir my_first_Docker_image
cd my_first_Docker_image

Python script

Create the hello.py file with the following content.

hello.py
print("Hello world!")
print("This is my 1st Docker image!")

Dockerfile

A Dockerfile is a text file which has a list of commands that Docker calls while creating an image. The Dockerfile is similar to a job batch file, and contains all information that Docker needs to know to to run the application package.

In the my_first_Docker_image directory, create a file, called Dockerfile, which has the content below.

Dockerfile
# our base image. The latest version will be pulled.
FROM alpine

# install python and pip
RUN apk add --update py3-pip

# copy files required to run
COPY hello.py /usr/src/my_app/

# run the application
CMD python3 /usr/src/my_app/hello.py

Now, let's learn the meaning of each line.

The first line means that we will use Alpine Linux as a base image. No version is specified, so the latest version will be pulled. Use the FROM keyword.

FROM alpine

Next, the Python pip package is installed using the Alpine Package Keeper (apk). Use the RUN keyword.

RUN apk add --update py3-pip

Next, copy the file to the image. /usr/src/my_app will be created while the file is copied. Use the COPY keyword.

COPY hello.py /usr/src/my_app/

The last step is run the application with the CMD keyword. CMD tells the container what the container should do by default when it is started.

CMD python3 /usr/src/my_app/hello.py

Build the image

Now you are ready to build your first Docker image. The docker build command will do most of the work.

To build the image, use the following command.

docker build -t my_first_image .

The client will pull all necessary images and create your image. If everything goes well, your image is ready to be used! Run the docker images command to see if your image my_first_image is shown.

Run your image

When you successfully create your Docker image, test it by starting a new container from the image.

1	`docker run my_first_image`

If everything went well, you will see this message.

Hello world!
This is my 1st Docker image!

Connecting Docker and your computer

Containers are great for keeping all the parts of a piece of software isolated together. But this means that there are a few extra steps necessary to share information from that container with the computer you're running it on.

Let's pretend that you have a container that runs a long analysis and outputs the results in some data file. We'll mimic this by just creating an empty file with the touch command. Let's make a new directory for our output and create our "important data" in our alpine image:

mkdir results
ls results

docker run alpine touch data.dat 
ls results

Nothing happened! You can look in your current directory with ls, and you won't see anything either. We can even run an interactive alpine container and look around:

docker run -it alpine sh
/ # ls
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var
/ # exit

No results directory and no data.dat file...

What happened is that the file was created and locked away in the previous container. When we start a new container, we start fresh from whatever the image specified. Nothing sticks around! So we need a way to get that data out of the container we're working in.

The way Docker does this is through "bind mounts". It's like we are "binding" a directory on our computer to a directory that's "mounted" in the container. Let's try it interactively first:

$ docker run -it -v ./results:outside_world alpine sh
/ # ls
bin            etc            lib            mnt            outside_world  root
sbin           sys            usr            dev            home           media
opt            proc           run            srv            tmp            var
/ # exit

The -v command tells Docker that I want to connect the ./results directory on my computer to a directory called /outside_world inside the container.

Now we can put it all together:

$ docker run -v ./results:/outside_world alpine touch /outside_world/data.dat
$ ls results
data.dat

Notice that we had to write to the container version of our directory with the touch command, but it's now visible on the computer in the results directory.

This is a contrived example, but in real life, you could replace touch ... with any command that can run in your container, including heavy duty data analysis. You just need to make sure that there is a bind mount between wherever that data is being written inside the container and wherever you want it outside the container (and/or vice versa if you want to input data into your container).

Exposing ports (and Jupyter example)

Software that connects to your web browser uses network ports to share information. These ports are just a number that tell your web browser where to access the content shared by the software, and are usually setup by the software (though there are often options a user can set to change the port number).

In the context of research computing, one of the most popular examples of this setup is Jupyter Notebook. When a Jupyter Notebook is running, it usually is available on port 8888, meaning you can access it from your web browser with the URL http://127.0.0.1:8888. Here, 127.0.0.1 will always be the IP address of your own computer which in this case is running the Jupyter Notebook on port 8888.

Since containers are meant to be an isolated computing environment, network ports are not accessible by default from outside the container. We will now go through an example of exposing ports from a Docker container using Jupyter Hub (a more full-featured version of Jupyter Notebooks). This example will also show an alternate way to include files in a Docker container, albeit, in more of a read-only way.

First, let's check the Jupyter images available on Docker Hub. We will use minimal-notebook.

$ docker search jupyter
NAME                                    DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED
jupyter/datascience-notebook            Jupyter Notebook Data Science Stack from htt…   666
jupyter/all-spark-notebook              Jupyter Notebook Python, Scala, R, Spark, Me…   301
jupyterhub/jupyterhub                   JupyterHub: multi-user Jupyter notebook serv…   248                                     [OK]
jupyter/scipy-notebook                  Jupyter Notebook Scientific Python Stack fro…   241
jupyter/tensorflow-notebook             Jupyter Notebook Scientific Python Stack w/ …   218
jupyter/pyspark-notebook                Jupyter Notebook Python, Spark, Mesos Stack …   157
jupyter/base-notebook                   Small base image for Jupyter Notebook stacks…   106
jupyter/minimal-notebook                Minimal Jupyter Notebook Stack from https://…   105
...

Let's start by creating a directory my_notebook. Copy hello.py, which we used for the Python image to the my_notebook directory. Then create a Dockerfile in the my_notebook directory with the following content:

Dockerfile
# base image
FROM jupyter/base-notebook

# copy files
COPY hello.py /home/jovyan/work

# the port number the container should expose
EXPOSE 8888

The COPY line moves our hello.py script into the container directory /home/jovyan/work which is where the Jupyer instance inside the container can access files.

The EXPOSE line specifies the port number which needs to be exposed. The default port for Jupyter is 8888, and therefore, we will expose that port.

Now, build the image using the following command:

$ docker build -t mynotebook .
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM jupyter/base-notebook
 ---> 6494235c84ec
Step 2/3 : COPY hello.py /home/jovyan/work
 ---> 6e22bc10eee0
Step 3/3 : EXPOSE 8888
 ---> Running in 2d754a40aa2b
Removing intermediate container 2d754a40aa2b
 ---> 464731f2e3a7
Successfully built 464731f2e3a7
Successfully tagged mynotebook:latest

Now, everything is ready. You can run the image using the docker run command. We use the -p 8888:8888 option to tell Docker that we'd like to bind the 8888 port in the container to the 8888 port on your host computer.

$ docker run -p 8888:8888 mynotebook
Entered start.sh with args: jupyter lab
Executing the command: jupyter lab
[I 2023-05-02 17:45:26.694 ServerApp] Package jupyterlab took 0.0000s to import
...
[I 2023-05-02 17:45:26.941 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2023-05-02 17:45:26.941 ServerApp] Jupyter Server 2.5.0 is running at:
[I 2023-05-02 17:45:26.941 ServerApp] http://ad7f42d45370:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
[I 2023-05-02 17:45:26.941 ServerApp]     http://127.0.0.1:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
[I 2023-05-02 17:45:26.941 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-05-02 17:45:26.943 ServerApp] 

    To access the server, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
    Or copy and paste one of these URLs:
        http://ad7f42d45370:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
        http://127.0.0.1:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879

If you navigate to one of the URLs that Jupyter outputs, you will see your containerized Jupyter Hub ready to go. If you look inside the work directory, you'll even see our hello.py script!