Docker
What is Docker? Docker is a tool to make it easier to create, deploy and run applications by using containers. Containers allow developers to package up an application with all of the dependencies such as libraries and tools, and deploy it as one package. The application will run on most operating systems (Mac/Windows/Linux) regardless of any customized settings. This page covers how you can run a development environments using Docker containers and package up your own code into a portable container.
Warning
This tutorial is meant to be run on your personal computer, not the HPCC.
Docker does not work on the HPCC since it requires super user (sudo
)
permissions that users do not have access to. To run containers on the
HPCC, you will need to use Singularity.
Nevertheless, many of the skills taught in this tutorial transfer over to using Singularity, and most Docker containers can be used without modification on the HPCC through Singularity. However, if you just want to get started running containers on the HPCC, start with the Singularity Introduction.
Docker installation
Docker can be installed on all major operating systems. However, note that installation on Windows requires the Windows Subsystem For Linux (WSL).
For detailed installation instructions depending on operating system, click here: Mac/Windows/Linux.
Testing Docker installation
When you have installed Docker, test your Docker installation by opening a terminal (if you are running Windows, this should be a WSL terminal) and running the following command:
$ docker --version
Docker version 19.03.8, build afacb8b
When you run the docker
command without --version
, you will see
the options available with docker. Alternatively, you can test your
installation by running the following (you have to log into Docker to
use this test):
$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
0e03bdcc26d7: Pull complete
Digest: sha256:6a65f928fb91fcfbc963f7aa6d57c8eeb426ad9a20c7ee045538ef34847f44f1
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
Running Docker containers from prebuilt images
Now, you have setup everything, and it is time to use Docker seriously. You
will run a container from the
Alpine Linux
image on your system and will learn the docker run
command. However, you
should first know what containers and images are, and the
difference between containers and images.
Images: The file system and configuration of applications which are created and distributed by developers. Of course, you can create and distribute images.
Containers: Running instances of Docker images. You can have many containers for the same image.
Now that you know what containers and images are, let's get some practice, by
running the docker run alpine ls -l
command in your terminal.
$ docker run alpine ls -l
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
cbdbe7a5bc2a: Pull complete
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
total 56
drwxr-xr-x 2 root root 4096 Apr 23 06:25 bin
drwxr-xr-x 5 root root 340 May 26 17:11 dev
drwxr-xr-x 1 root root 4096 May 26 17:11 etc
drwxr-xr-x 2 root root 4096 Apr 23 06:25 home
drwxr-xr-x 5 root root 4096 Apr 23 06:25 lib
drwxr-xr-x 5 root root 4096 Apr 23 06:25 media
drwxr-xr-x 2 root root 4096 Apr 23 06:25 mnt
drwxr-xr-x 2 root root 4096 Apr 23 06:25 opt
dr-xr-xr-x 187 root root 0 May 26 17:11 proc
drwx------ 2 root root 4096 Apr 23 06:25 root
drwxr-xr-x 2 root root 4096 Apr 23 06:25 run
drwxr-xr-x 2 root root 4096 Apr 23 06:25 sbin
drwxr-xr-x 2 root root 4096 Apr 23 06:25 srv
dr-xr-xr-x 12 root root 0 May 26 17:11 sys
drwxrwxrwt 2 root root 4096 Apr 23 06:25 tmp
drwxr-xr-x 7 root root 4096 Apr 23 06:25 usr
drwxr-xr-x 12 root root 4096 Apr 23 06:25 var
When you run the docker run alpine ls -l
command, it searches for the
alpine:latest
image from your system first. If your system has it (i.e. if
you downloaded it previously), Docker uses that image.
If your system does not have that image, then Docker fetches the
alpine:latest
image from Docker Hub first, saves
it in your system, then runs a container from the saved image. Docker Hub is a
huge repository of images people have uploaded so that others can download and
run their code in containers. Though Docker Hub is the most popular place to
find Docker images, there are other sources that work just as well (for
example, Quay.io).
docker run alpine
starts a container, and then ls -l
will be a command
which is fed to the container, so Docker starts the given command and
results show up.
To see a list of all images on your system, you can use the docker
images
command.
$ docker images
alpine latest f70734b6a266 4 weeks ago 5.61MB
hello-world latest bf756fb1ae65 4 months ago 13.3kB
Next, let's try another command.
$ docker run alpine echo "Hello world"
Hello world
In this case, Docker ran the echo
command in your alpine
container, and then exited it. Exit means the container is terminated
after running the command.
Let's try another command.
docker run alpine sh
It seems nothing happened. In fact, docker ran the sh
command in your alpine
container, and exited it. If you want to be inside the container shell, you
need to use docker run -it alpine sh
. The -i
flag tells Docker you want to
run the container interactively and the -t
flag tells it you want to start a
terminal in that image to run your command. You can find more help on the run
command with docker run --help
.
Let's run a few commands inside the docker run -it alpine sh
container.
$ docker run -it alpine sh
/ # ls
bin etc lib mnt proc run srv tmp var
dev home media opt root sbin sys usr
/ # uname -a
Linux c1552c9b6cf0 4.19.76-linuxkit #1 SMP Fri Apr 3 15:53:26 UTC 2020 x86_64 Linux
/ # exit
You are inside of the container shell and you can try out a few commands like
ls
and uname -a
and others. To quit the container, type exit
on the
terminal. If you use the exit
command, the container is terminated. If you
want to keep the container active, then you can use keys Ctrl-p
followed by
Ctrl-q
(you don't have to press these key combinations simultaneously). If
you want to go back into the container, you can type docker attach
<container_id>
, such as docker attach c1552c9b6cf0
. You can find container
id with docker ps -all
. This command will be explained next.
Now, let's learn about the docker ps
command which shows you all
containers that are currently running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
In this case, you don't see any container because no containers are running. To see
a list of all containers that you ran, use docker ps --all
. You can
see that STATUS says that all containers exited.
$ docker ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c1552c9b6cf0 alpine "sh" 6 minutes ago Exited (0) 2 minutes ago wonderful_cori
5de22ab86f2a alpine "echo 'Hello world'" 18 minutes ago Exited (0) 18 minutes ago goofy_visvesvaraya
df35ee7df7e3 alpine "ls -l" 31 minutes ago Exited (0) 31 minutes ago fervent_gould
6dbe999044b4 hello-world "/hello" 3 hours ago Exited (0) 3 hours ago
When Docker containers are created, the Docker system automatically assign a universally unique identifier (UUID) number to each container to avoid any naming conflicts. CONTAINER ID is a shortform of the UUID. Each container also has a randomly generated name. You can usually use this name in place of the CONTAINER ID to make typing a bit easier.
You can also assign names to your Docker containers when you run them, using
the --name
flags. In addition, you can rename your Docker container's name
with rename
command. For example, let's rename "wonderful_cori" to
"my_container" with docker rename
command.
$ docker rename wonderful_cori my_container
$ docker ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c1552c9b6cf0 alpine "sh" 10 minutes ago Exited (0) 6 minutes ago my_container
5de22ab86f2a alpine "echo 'Hello world'" 22 minutes ago Exited (0) 22 minutes ago goofy_visvesvaraya
df35ee7df7e3 alpine "ls -l" 35 minutes ago Exited (0) 35 minutes ago fervent_gould
Build Docker images which contain your own code
Now you are ready to use Docker to create your own applications! First, you will learn more about Docker images. Then you will build your own image and use that image to run an application on your local machine.
Docker images
Docker images are basis of containers. In the above example, you pulled
the alpine image from Docker Hub and ran a container based on that
image. To see the list of images that are available on your local
machine, run the docker images
command.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
alpine latest 9ed4aefc74f6 4 weeks ago 7.05MB
hello-world latest feb5d9fea6a5 19 months ago 13.3kB
The TAG refers to a particular snapshot of the image and the ID is the corresponding UUID of the image. Images can have multiple versions. When you do not assign a specific version number, the client defaults to latest. If you want a specific version of the image, you can use docker pull command as follows:
$ docker pull ubuntu:22.04
7: Pulling from library/ubuntu
2d473b07cdd5: Pull complete
Digest: sha256:0eb0f877e1c869a300c442c41120e778db7161419244ee5cbc6fa5f134e74736
Status: Downloaded newer image for ubuntu:22.04
docker.io/library/ubuntu:22.04
Notice that here we pulled the image without running it. When we run a
container with the ubuntu:22.04
image in the future, it will use this downloaded
copy.
You can search for images from a repository's website (for example, searching
Docker hub for CentOS) or directly from the
command line using docker search
.
$ docker search centos
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
centos The official build of CentOS. 6014 [OK]
ansible/centos7-ansible Ansible on Centos7 129 [OK]
consol/centos-xfce-vnc Centos container with "headless" VNC session… 115 [OK]
jdeathe/centos-ssh OpenSSH / Supervisor / EPEL/IUS/SCL Repos - … 114 [OK]
centos/mysql-57-centos7 MySQL 5.7 SQL database server 76
imagine10255/centos6-lnmp-php56 centos6-lnmp-php56 58 [OK]
tutum/centos Simple CentOS docker image with SSH access 46
centos/postgresql-96-centos7 PostgreSQL is an advanced Object-Relational … 44
kinogmt/centos-ssh CentOS with SSH 29 [OK]
pivotaldata/centos-gpdb-dev CentOS image for GPDB development. Tag names… 12
guyton/centos6 From official centos6 container with full up… 10 [OK]
centos/tools Docker image that has systems administration… 6 [OK]
drecom/centos-ruby centos ruby 6 [OK]
pivotaldata/centos Base centos, freshened up a little with a Do… 4
pivotaldata/centos-mingw Using the mingw toolchain to cross-compile t… 3
darksheer/centos Base Centos Image -- Updated hourly 3 [OK]
mamohr/centos-java Oracle Java 8 Docker image based on Centos 7 3 [OK]
pivotaldata/centos-gcc-toolchain CentOS with a toolchain, but unaffiliated wi… 3
miko2u/centos6 CentOS6 日本語環境 2 [OK]
blacklabelops/centos CentOS Base Image! Built and Updates Daily! 1 [OK]
indigo/centos-maven Vanilla CentOS 7 with Oracle Java Developmen… 1 [OK]
mcnaughton/centos-base centos base image 1 [OK]
pivotaldata/centos7-dev CentosOS 7 image for GPDB development 0
smartentry/centos centos with smartentry 0 [OK]
pivotaldata/centos6.8-dev CentosOS 6.8 image for GPDB development 0
Building your first Docker image
In this section, you will build a simple Docker image with writing a Dockerfile, and run it. For this purpose, we will create a Python script and a Dockerfile.
Creating working directory
Let's create a working directory where you will make the following files:
hello.py
, Dockerfile
.
cd ~
mkdir my_first_Docker_image
cd my_first_Docker_image
Python script
Create the hello.py
file with the following content.
print("Hello world!")
print("This is my 1st Docker image!")
Dockerfile
A Dockerfile is a text file which has a list of commands that Docker calls while creating an image. The Dockerfile is similar to a job batch file, and contains all information that Docker needs to know to to run the application package.
In the my_first_Docker_image
directory, create a file, called Dockerfile,
which has the content below.
# our base image. The latest version will be pulled.
FROM alpine
# install python and pip
RUN apk add --update py3-pip
# copy files required to run
COPY hello.py /usr/src/my_app/
# run the application
CMD python3 /usr/src/my_app/hello.py
Now, let's learn the meaning of each line.
The first line means that we will use Alpine Linux as a base image. No
version is specified, so the latest version will be pulled. Use the FROM
keyword.
FROM alpine
Next, the Python pip
package is installed using the Alpine Package Keeper
(apk
). Use the RUN
keyword.
RUN apk add --update py3-pip
Next, copy the file to the image. /usr/src/my_app
will be created
while the file is copied. Use the COPY
keyword.
COPY hello.py /usr/src/my_app/
The last step is run the application with the CMD
keyword. CMD
tells
the container what the container should do by default when it is
started.
CMD python3 /usr/src/my_app/hello.py
Build the image
Now you are ready to build your first Docker image. The docker build
command will do most of the work.
To build the image, use the following command.
docker build -t my_first_image .
The client will pull all necessary images and create your image. If
everything goes well, your image is ready to be used! Run the docker images
command to see if your image my_first_image
is shown.
Run your image
When you successfully create your Docker image, test it by starting a new container from the image.
docker run my_first_image
If everything went well, you will see this message.
Hello world!
This is my 1st Docker image!
Connecting Docker and your computer
Containers are great for keeping all the parts of a piece of software isolated together. But this means that there are a few extra steps necessary to share information from that container with the computer you're running it on.
Sharing data
Let's pretend that you have a container that runs a long analysis and outputs
the results in some data file. We'll mimic this by just creating an empty file
with the touch
command. Let's make a new directory for our output and create
our "important data" in our alpine image:
mkdir results
ls results
docker run alpine touch data.dat
ls results
Nothing happened! You can look in your current directory with ls
, and you
won't see anything either. We can even run an interactive alpine container and
look around:
docker run -it alpine sh
/ # ls
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
/ # exit
No results directory and no data.dat
file...
What happened is that the file was created and locked away in the previous container. When we start a new container, we start fresh from whatever the image specified. Nothing sticks around! So we need a way to get that data out of the container we're working in.
The way Docker does this is through "bind mounts". It's like we are "binding" a directory on our computer to a directory that's "mounted" in the container. Let's try it interactively first:
$ docker run -it -v ./results:outside_world alpine sh
/ # ls
bin etc lib mnt outside_world root
sbin sys usr dev home media
opt proc run srv tmp var
/ # exit
The -v
command tells Docker that I want to connect the ./results
directory
on my computer to a directory called /outside_world
inside the container.
Now we can put it all together:
$ docker run -v ./results:/outside_world alpine touch /outside_world/data.dat
$ ls results
data.dat
Notice that we had to write to the container version of our directory with the
touch
command, but it's now visible on the computer in the results
directory.
This is a contrived example, but in real life, you could replace touch ...
with any command that can run in your container, including heavy duty data
analysis. You just need to make sure that there is a bind mount between
wherever that data is being written inside the container and wherever you want
it outside the container (and/or vice versa if you want to input data into your
container).
Exposing ports (and Jupyter example)
Software that connects to your web browser uses network ports to share information. These ports are just a number that tell your web browser where to access the content shared by the software, and are usually setup by the software (though there are often options a user can set to change the port number).
In the context of research computing, one of the most popular examples of this setup is Jupyter Notebook. When a Jupyter Notebook is running, it usually is available on port 8888, meaning you can access it from your web browser with the URL http://127.0.0.1:8888. Here, 127.0.0.1 will always be the IP address of your own computer which in this case is running the Jupyter Notebook on port 8888.
Since containers are meant to be an isolated computing environment, network ports are not accessible by default from outside the container. We will now go through an example of exposing ports from a Docker container using Jupyter Hub (a more full-featured version of Jupyter Notebooks). This example will also show an alternate way to include files in a Docker container, albeit, in more of a read-only way.
First, let's check the Jupyter images available on Docker Hub. We will use minimal-notebook.
$ docker search jupyter
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
jupyter/datascience-notebook Jupyter Notebook Data Science Stack from htt… 666
jupyter/all-spark-notebook Jupyter Notebook Python, Scala, R, Spark, Me… 301
jupyterhub/jupyterhub JupyterHub: multi-user Jupyter notebook serv… 248 [OK]
jupyter/scipy-notebook Jupyter Notebook Scientific Python Stack fro… 241
jupyter/tensorflow-notebook Jupyter Notebook Scientific Python Stack w/ … 218
jupyter/pyspark-notebook Jupyter Notebook Python, Spark, Mesos Stack … 157
jupyter/base-notebook Small base image for Jupyter Notebook stacks… 106
jupyter/minimal-notebook Minimal Jupyter Notebook Stack from https://… 105
...
Let's start by creating a directory my_notebook
. Copy hello.py
, which we
used for the Python image to the my_notebook
directory. Then create a
Dockerfile in the my_notebook
directory with the following content:
# base image
FROM jupyter/base-notebook
# copy files
COPY hello.py /home/jovyan/work
# the port number the container should expose
EXPOSE 8888
The COPY
line moves our hello.py
script into the container directory
/home/jovyan/work
which is where the Jupyer instance inside the container can
access files.
The EXPOSE
line specifies the port number which needs to be exposed.
The default port for Jupyter is 8888, and therefore, we will expose that
port.
Now, build the image using the following command:
$ docker build -t mynotebook .
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM jupyter/base-notebook
---> 6494235c84ec
Step 2/3 : COPY hello.py /home/jovyan/work
---> 6e22bc10eee0
Step 3/3 : EXPOSE 8888
---> Running in 2d754a40aa2b
Removing intermediate container 2d754a40aa2b
---> 464731f2e3a7
Successfully built 464731f2e3a7
Successfully tagged mynotebook:latest
Now, everything is ready. You can run the image using the docker run
command.
We use the -p 8888:8888
option to tell Docker that we'd like to bind the 8888
port in the container to the 8888 port on your host computer.
$ docker run -p 8888:8888 mynotebook
Entered start.sh with args: jupyter lab
Executing the command: jupyter lab
[I 2023-05-02 17:45:26.694 ServerApp] Package jupyterlab took 0.0000s to import
...
[I 2023-05-02 17:45:26.941 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2023-05-02 17:45:26.941 ServerApp] Jupyter Server 2.5.0 is running at:
[I 2023-05-02 17:45:26.941 ServerApp] http://ad7f42d45370:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
[I 2023-05-02 17:45:26.941 ServerApp] http://127.0.0.1:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
[I 2023-05-02 17:45:26.941 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-05-02 17:45:26.943 ServerApp]
To access the server, open this file in a browser:
file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
Or copy and paste one of these URLs:
http://ad7f42d45370:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
http://127.0.0.1:8888/lab?token=b76dcd0514e9fe7f60b145e936852ab7836df45e6e4b0879
If you navigate to one of the URLs that Jupyter outputs, you will see your
containerized Jupyter Hub ready to go. If you look inside the work
directory,
you'll even see our hello.py
script!