Docker
What is Docker? Docker is a tool to make it easier to create, deploy and run applications by using containers. Containers allow developers to package up an application with all of the dependencies such as libraries and tools, and deploy it as one package. The application will run on most operating systems (Mac/Windows/Linux) regardless of any customized settings. This page covers how you can run a development environments using Docker containers and package up your own code into a portable container.
Warning
This tutorial is meant to be run on your personal computer, not the HPCC.
Docker does not work on the HPCC since it requires super user (sudo
)
permissions that users do not have access to. To run containers on the
HPCC, you will need to use Singularity.
Nevertheless, many of the skills taught in this tutorial transfer over to using Singularity, and most Docker containers can be used without modification on the HPCC through Singularity. However, if you just want to get started running containers on the HPCC, start with the Singularity Introduction.
Docker installation
Docker can be installed on all major operating systems. However, note that installation on Windows requires the Windows Subsystem For Linux (WSL).
For detailed installation instructions depending on operating system, click here: Mac/Windows/Linux.
Testing Docker installation
When you have installed Docker, test your Docker installation by opening a terminal (if you are running Windows, this should be a WSL terminal) and running the following command:
1 2 |
|
When you run the docker
command without --version
, you will see
the options available with docker. Alternatively, you can test your
installation by running the following (you have to log into Docker to
use this test):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Running Docker containers from prebuilt images
Now, you have setup everything, and it is time to use Docker seriously. You
will run a container from the
Alpine Linux
image on your system and will learn the docker run
command. However, you
should first know what containers and images are, and the
difference between containers and images.
Images: The file system and configuration of applications which are created and distributed by developers. Of course, you can create and distribute images.
Containers: Running instances of Docker images. You can have many containers for the same image.
Now that you know what containers and images are, let's get some practice, by
running the docker run alpine ls -l
command in your terminal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
When you run the docker run alpine ls -l
command, it searches for the
alpine:latest
image from your system first. If your system has it (i.e. if
you downloaded it previously), Docker uses that image.
If your system does not have that image, then Docker fetches the
alpine:latest
image from Docker Hub first, saves
it in your system, then runs a container from the saved image. Docker Hub is a
huge repository of images people have uploaded so that others can download and
run their code in containers. Though Docker Hub is the most popular place to
find Docker images, there are other sources that work just as well (for
example, Quay.io).
docker run alpine
starts a container, and then ls -l
will be a command
which is fed to the container, so Docker starts the given command and
results show up.
To see a list of all images on your system, you can use the docker
images
command.
1 2 3 |
|
Next, let's try another command.
1 2 |
|
In this case, Docker ran the echo
command in your alpine
container, and then exited it. Exit means the container is terminated
after running the command.
Let's try another command.
1 |
|
It seems nothing happened. In fact, docker ran the sh
command in your alpine
container, and exited it. If you want to be inside the container shell, you
need to use docker run -it alpine sh
. The -i
flag tells Docker you want to
run the container interactively and the -t
flag tells it you want to start a
terminal in that image to run your command. You can find more help on the run
command with docker run --help
.
Let's run a few commands inside the docker run -it alpine sh
container.
1 2 3 4 5 6 7 |
|
You are inside of the container shell and you can try out a few commands like
ls
and uname -a
and others. To quit the container, type exit
on the
terminal. If you use the exit
command, the container is terminated. If you
want to keep the container active, then you can use keys Ctrl-p
followed by
Ctrl-q
(you don't have to press these key combinations simultaneously). If
you want to go back into the container, you can type docker attach
<container_id>
, such as docker attach c1552c9b6cf0
. You can find container
id with docker ps -all
. This command will be explained next.
Now, let's learn about the docker ps
command which shows you all
containers that are currently running.
1 2 |
|
In this case, you don't see any container because no containers are running. To see
a list of all containers that you ran, use docker ps --all
. You can
see that STATUS says that all containers exited.
1 2 3 4 5 6 |
|
When Docker containers are created, the Docker system automatically assign a universally unique identifier (UUID) number to each container to avoid any naming conflicts. CONTAINER ID is a shortform of the UUID. Each container also has a randomly generated name. You can usually use this name in place of the CONTAINER ID to make typing a bit easier.
You can also assign names to your Docker containers when you run them, using
the --name
flags. In addition, you can rename your Docker container's name
with rename
command. For example, let's rename "wonderful_cori" to
"my_container" with docker rename
command.
1 2 3 4 5 6 |
|
Build Docker images which contain your own code
Now you are ready to use Docker to create your own applications! First, you will learn more about Docker images. Then you will build your own image and use that image to run an application on your local machine.
Docker images
Docker images are basis of containers. In the above example, you pulled
the alpine image from Docker Hub and ran a container based on that
image. To see the list of images that are available on your local
machine, run the docker images
command.
1 2 3 4 |
|
The TAG refers to a particular snapshot of the image and the ID is the corresponding UUID of the image. Images can have multiple versions. When you do not assign a specific version number, the client defaults to latest. If you want a specific version of the image, you can use docker pull command as follows:
1 2 3 4 5 6 |
|
Notice that here we pulled the image without running it. When we run a
container with the ubuntu:22.04
image in the future, it will use this downloaded
copy.
You can search for images from a repository's website (for example, searching
Docker hub for CentOS) or directly from the
command line using docker search
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Building your first Docker image
In this section, you will build a simple Docker image with writing a Dockerfile, and run it. For this purpose, we will create a Python script and a Dockerfile.
Creating working directory
Let's create a working directory where you will make the following files:
hello.py
, Dockerfile
.
1 2 3 |
|
Python script
Create the hello.py
file with the following content.
hello.py | |
---|---|
1 2 |
|
Dockerfile
A Dockerfile is a text file which has a list of commands that Docker calls while creating an image. The Dockerfile is similar to a job batch file, and contains all information that Docker needs to know to to run the application package.
In the my_first_Docker_image
directory, create a file, called Dockerfile,
which has the content below.
Dockerfile | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Now, let's learn the meaning of each line.
The first line means that we will use Alpine Linux as a base image. No
version is specified, so the latest version will be pulled. Use the FROM
keyword.
1 |
|
Next, the Python pip
package is installed using the Alpine Package Keeper
(apk
). Use the RUN
keyword.
1 |
|
Next, copy the file to the image. /usr/src/my_app
will be created
while the file is copied. Use the COPY
keyword.
1 |
|
The last step is run the application with the CMD
keyword. CMD
tells
the container what the container should do by default when it is
started.
1 |
|
Build the image
Now you are ready to build your first Docker image. The docker build
command will do most of the work.
To build the image, use the following command.
1 |
|
The client will pull all necessary images and create your image. If
everything goes well, your image is ready to be used! Run the docker images
command to see if your image my_first_image
is shown.
Run your image
When you successfully create your Docker image, test it by starting a new container from the image.
1 |
|
If everything went well, you will see this message.
1 2 |
|
Connecting Docker and your computer
Containers are great for keeping all the parts of a piece of software isolated together. But this means that there are a few extra steps necessary to share information from that container with the computer you're running it on.
Sharing data
Let's pretend that you have a container that runs a long analysis and outputs
the results in some data file. We'll mimic this by just creating an empty file
with the touch
command. Let's make a new directory for our output and create
our "important data" in our alpine image:
1 2 3 4 5 |
|
Nothing happened! You can look in your current directory with ls
, and you
won't see anything either. We can even run an interactive alpine container and
look around:
1 2 3 4 |
|
No results directory and no data.dat
file...
What happened is that the file was created and locked away in the previous container. When we start a new container, we start fresh from whatever the image specified. Nothing sticks around! So we need a way to get that data out of the container we're working in.
The way Docker does this is through "bind mounts". It's like we are "binding" a directory on our computer to a directory that's "mounted" in the container. Let's try it interactively first:
1 2 3 4 5 6 |
|
The -v
command tells Docker that I want to connect the ./results
directory
on my computer to a directory called /outside_world
inside the container.
Now we can put it all together:
1 2 3 |
|
Notice that we had to write to the container version of our directory with the
touch
command, but it's now visible on the computer in the results
directory.
This is a contrived example, but in real life, you could replace touch ...
with any command that can run in your container, including heavy duty data
analysis. You just need to make sure that there is a bind mount between
wherever that data is being written inside the container and wherever you want
it outside the container (and/or vice versa if you want to input data into your
container).
Exposing ports (and Jupyter example)
Software that connects to your web browser uses network ports to share information. These ports are just a number that tell your web browser where to access the content shared by the software, and are usually setup by the software (though there are often options a user can set to change the port number).
In the context of research computing, one of the most popular examples of this setup is Jupyter Notebook. When a Jupyter Notebook is running, it usually is available on port 8888, meaning you can access it from your web browser with the URL http://127.0.0.1:8888. Here, 127.0.0.1 will always be the IP address of your own computer which in this case is running the Jupyter Notebook on port 8888.
Since containers are meant to be an isolated computing environment, network ports are not accessible by default from outside the container. We will now go through an example of exposing ports from a Docker container using Jupyter Hub (a more full-featured version of Jupyter Notebooks). This example will also show an alternate way to include files in a Docker container, albeit, in more of a read-only way.
First, let's check the Jupyter images available on Docker Hub. We will use minimal-notebook.
1 2 3 4 5 6 7 8 9 10 11 |
|
Let's start by creating a directory my_notebook
. Copy hello.py
, which we
used for the Python image to the my_notebook
directory. Then create a
Dockerfile in the my_notebook
directory with the following content:
Dockerfile | |
---|---|
1 2 3 4 5 6 7 8 |
|
The COPY
line moves our hello.py
script into the container directory
/home/jovyan/work
which is where the Jupyer instance inside the container can
access files.
The EXPOSE
line specifies the port number which needs to be exposed.
The default port for Jupyter is 8888, and therefore, we will expose that
port.
Now, build the image using the following command:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now, everything is ready. You can run the image using the docker run
command.
We use the -p 8888:8888
option to tell Docker that we'd like to bind the 8888
port in the container to the 8888 port on your host computer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
If you navigate to one of the URLs that Jupyter outputs, you will see your
containerized Jupyter Hub ready to go. If you look inside the work
directory,
you'll even see our hello.py
script!