Build a Custom Docker Image
Question
How can I build a custom docker image to run a Metaflow step?
Solution
Metaflow has decorators to run steps on remote compute environments like @batch
and @kubernetes
. The environments these jobs run in can both be created from a Docker image.
In some circumstances you may need to create your own image for running a step or a flow. In this case there are a few things to consider when building your image.
Specify an Image in a Flow
First it is important to mention how Metaflow knows which image to use. You can read more about using a custom image here.
If you do not specify the image argument like @batch(image="my_image:latest")
, Metaflow will look to see if you have configured a default container image for the compute plugin you are using in the METAFLOW_DEFAULT_CONTAINER_IMAGE
variable.
If this configuration is not specified and you do not specify the image
argument in the decorator, the image is built from the official Python image for the version of Python you are using in your local environment.
1Write a Dockerfile
Docker images are built using a Dockerfile. When building one to use with Metaflow there are a few considerations to keep in mind.
Base Image
A minimum requirement is that you will need Python in the image - we suggest starting from an official Python image. For example, you can add the following at the start of your Dockerfile:
FROM python:3.10
The image should come with standard CLI tools like tar, so we suggest avoiding starting the Dockerfile with FROM scratch
.
User Permissions
Metaflow needs to be able to write in the working directory. In the Dockerfile this concerns the WORKDIR
and USER
commands. You should make sure that the user running commands can write in the working directory, especially when you do explicitly set these in your Dockerfile. Note that many images use root user by default, and Metaflow does not so you may have to explicitly specify a non-root USER
in your Dockerfile. You can use the following to check the user for your image:
docker run --rm -it <YOUR IMAGE> bash -c id
For example, by default this Python image user id is root:
docker run --rm -it python:3.10 bash -c id
You can change the user in the Dockerfile like
FROM my_base_image:latest
USER my_user
...
Using ENTRYPOINT and CMD
We suggest you do not set either of these in your Dockerfile. Metaflow constructs a command to run the container for you, so defining the ENTRYPOINT too can produce unexpected errors.
Example
Here is an example of a standard Dockerfile.
- The
WORKDIR
is changed and theUSER
has write permission. - The
COPY
command moves arequirements.txt
file into the image and installs the contents. You could follow a similar copying process to install custom modules that are not on PyPi. - Also notice there is no CMD or ENTRYPOINT since Metaflow will override this for you anyways.
FROM python:3.10
RUN mkdir /logs && chown 1000 /logs
RUN mkdir /metaflow && chown 1000 /metaflow
ENV HOME=/metaflow
WORKDIR /metaflow
USER 1000
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
2Build your Image
This process is not unique to Metaflow. Once you have written a Dockerfile like the one above, you can build it from the directory like:
docker build .
If you are building or running your image on MacOS, and plan to later deploy to a Linux machine, you will need to specify --platform=linux/amd64
in your build and run commands. For example, when using the EC2 instances that power AWS Batch environments you will want to make sure the image is built for the right platform. You can set the platform automatically when building and running images by using an environment variable:
export DOCKER_DEFAULT_PLATFORM=linux/amd64
Another alternative is to specify the platform in the beginning of your Dockerfile:
FROM --platform=linux/amd64 image:tag
3Configure Metaflow to Use your Image
Once you have built your image you need to tell Metaflow to use it. This requires pushing the image to a registry that you have permission to access. For example, in AWS you might want your image to reside in ECR.
In a flow the most direct way to tell Metaflow to use this image for a step is to use the plugin decorators like @batch(image=<my_image>:<my_tag>)
and @kubernetes(image=<my_image>:<my_tag>)
. You can also set default environment variables so Metaflow knows to look for a certain image in a specified container registry by default.
Some configuration variables to keep in mind for specifying a URI for a default image and container registry are METAFLOW_DEFAULT_CONTAINER_IMAGE
and METAFLOW_DEFAULT_CONTAINER_REGISTRY
.
METAFLOW_DEFAULT_CONTAINER_IMAGE
dictates the default container image that Metaflow should use.METAFLOW_DEFAULT_CONTAINER_REGISTRY
controls which registry Metaflow uses to pick the image - this defaults to DockerHub.
These will then be used as a default across compute plugins. Metaflow configuration variables can be set in the active METAFLOW_PROFILE
stored in ~/.metaflow-config/
or as environment variables.
For example, if your container registry is in AWS ECR you can set an environment variable like:
export METAFLOW_DEFAULT_CONTAINER_REGISTRY=<aws_account_id>.dkr.ecr.<region>.amazonaws.com
and then decorate your flow steps like:
@batch(image="image-in-my-registry:latest")
@step
def containerized_step(self):
...
Alternatively, you can specify the registry, image, and tag all in the decorator:
@batch(image="url-to-docker-repo/docker-image:version")
@step
def containerized_step(self):
...
Note that if you are manually configuring the underlying resources for remote compute plugins (as opposed to automating deployment through CloudFormation or Terraform) you will need to make ensure that the appropriate roles are available for those resources.
Further Reading
- Use a custom image in your flow
- See configuration details in: metaflow_config.py
- See where in the Metaflow code image and container registry variables are used for @batch and @kubernetes
- Building a Dockerfile for a Python environment
- Understand how CMD and ENTRYPOINT interact
- Best practices for containerizing Python applications with Docker