Use a Custom Docker Image
Question
How can I use a custom docker image to run a Metaflow step?
Solution
Metaflow has decorators to run steps on remote compute environments like @batch
and @kubernetes
. The environments run jobs created from a Docker image.
1Select an Image
You can either build an image or choose one. If you choose an existing image, make sure that Python can be invoked from the container.
You can tell Metaflow which image you want to use in several ways:
- passing the
image
argument in a decorator like@batch(image="my_image:latest")
- in Metaflow config files
METAFLOW_DEFAULT_CONTAINER_REGISTRY
controls which registry Metaflow uses to pick the image - this defaults to DockerHub but could also be a URL to a public or private ECR repository on AWS.METAFLOW_DEFAULT_CONTAINER_IMAGE
dictates the default container image that Metaflow should use.
- don't specify and let Metaflow default to the official Python image
- in this case, the default corresponds to the major.minor version of Python that the user used to launch the flow
2Run Flow
For example, this flow uses the official Python image in the run_in_container
step. In this example only the image name and tag is specified but know that you can also pass in the full URL to the image in @batch
or @kubernetes
- @batch(image="url-to-docker-repo/docker-image:version")
.
Note about GPU images
In these decorators you will see resource arguments like cpu=1
. Assuming that your Metaflow deployment allows you to access compute instances with GPU resources, you can also set gpu=N
and Metaflow will automatically prepare your image in a way that works with GPU. In this example access means that the AWS Batch compute environment will need access to EC2 instances with GPUs.
from metaflow import FlowSpec, step, batch, conda
import os
class UseImageFlow(FlowSpec):
@step
def start(self):
self.next(self.run_in_container)
@batch(image="python:3.10", cpu=1)
@step
def run_in_container(self):
self.artifact_from_container = 7
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
UseImageFlow()
python use_image_flow.py run
3Access Artifacts Outside of Flow
The following can be run in a Python script or notebook to access the artifact produced in the container step:
from metaflow import Flow
run = Flow("UseImageFlow").latest_run
assert run.successful
# get data produced in containerized step
artifact = run.data.artifact_from_container
assert artifact == 7
Further Reading
- Build a custom image
- Set environment variables in a container using Metaflow's @environment decorator or .env file
- See where in the Metaflow code image and container registry variables are used for @batch and @kubernetes