Project structure
After setting up an empty project, you can begin adding your own components. Fundamentally, all projects are composed of these top-level components:
Flows
Flows refer to Metaflow flows, often interconnected through events. They form the backbone of your projects, handling data processing and ETL, model training and finetuning, autonomous / batch inferencing, amongst any other types of background processing and high-performance computing.
In projects, flows are stored under a subdirectory flows
, one Metaflow flow
(named flow.py
) per subdirectory, alongside any supporting Python modules and packages. As a best practice, it is useful to add a README.md
file for each
flow describing its role. They will be surfaced in the UI as well.
Authoring ProjectFlow
s
Importantly, project flows should subclass from ProjectFlow
instead of
Metaflow's standard FlowSpec
. In other words, simply author your flows like this:
from obproject import ProjectFlow
class MyFlow(ProjectFlow):
...
This leverages Metaflow's BaseFlow
pattern to enrich flows with functionality related to the project structure. Besides this small
detail, you may leverage all Metaflow features in your flows.
A typical flow hierarchy in a project repository ends up looking like this:
flows/
etl/
flow.py
README.md
feature_transformations.py
sql/process_data.sql
train_model/
flow.py
README.md
model.py
Deployments
Deployments are microservices that serve requests through real-time APIs. Use cases include
- Model hosting and inference, including GenAI models running of fleets of GPUs.
- UIs and dashboards, such as Tensorboard, Streamlit apps, or other internal UIs.
- Real-time agents that respond to incoming requests and take action based on LLM outputs.
The platform’s strength comes from the tight connection between flows and deployments, bridging the offline and online worlds. For instance,
- A flow can update a database for RAG continuously, which is then used in real-time by a deployed agent.
- Or, you can have a custom app for monitoring model performance which you use to trigger a model retraining flow.
- It is also possible to deploy model endpoints programmatically from a flow, for instance, whenever a new model has been trained.
In your project, place deployments in the deployments
directory. Each
deployment is defined by a configuration file, config.yml
, as
documented in the documentation for deployments. You can define
dependencies for the deployment in a standard requirements.txt
or
pyproject.toml
. As with flows, it is recommended to add a README.md
for each deployment.
The project hierarchy will look like this
deployments/
monitoring_dashboard/
streamlit_app.py
config.yml
pyproject.toml
README.md
model_endpoint/
fastapi_server.py
config.yml
pyproject.toml
README.md
support_agent/
agent.py
config.yml
pyproject.toml
README.md
Code
Effective management of software dependencies is essential for building production-quality projects and enabling rapid iteration and collaboration.
A typical project consists of multiple layers of software dependencies:
- Code definining flows and deployments.
- Project-level shared libraries.
- Organization-level libraries shared across projects.
- Third-party dependencies, such as
pandas
andtorch
.
As an example, consider the following project that trains a fraud detection model and deploys it for real-time inference:
fraud_detection_model/
obproject.toml
pyproject.toml
README.md
src/
feature_encoders/
__init__.py
feature_encoder.py
flows/
trainer/
flow.py
mymodel.py
README.md
deployments/
inference/
fastapi_server.py
config.yml
README.md
Code defining flows and deployments is organized into subdirectories. In addition to the entrypoint file (flow.py
) or deployment server, each flow or deployment can include supporting modules and packages, such as mymodel.py
.
Project-level shared libraries should be placed under the src
directory.
Here, we define a package feature_encoders
which is used both during training
and inference to ensure offline-online consistency of features. Importantly, you should include a line
METAFLOW_PACKAGE_POLICY = 'include'
in the __init__.py
module of each package, to ensure that the package gets
included in the Metaflow's code
package. Packages
under src/
are added to PYTHONPATH
automatically, so they are readily
usable in flows and deployments.
Organization-level shared libraries can be handled in two ways:
If you can set
METAFLOW_PACKAGE_POLICY
in packages, you may simplypip install
them as usual or add them in yourPYTHONPATH
. Once youimport
them in your flows and deployments, they'll get packaged automatically. This is a convenient option for private packages, even if they are notpip install
-able from a package repository.If the shared libraries are pushed to a package repository - private or public - you can treat them similarly as 3rd party dependencies, described below.
3rd party dependencies can be handled through Metaflow's @pypi
or @conda
decorators, or through the standard pyproject.toml
or requirements.txt
files. When the project is deployed, Outerbounds uses Fast Bakery to bake the requirements into a container image automatically.
For convenience, you may drop a project-wide pyproject.toml
at the root of
the project next to obproject.toml
. For instance, to include pandas
and
fastapi
in the project, you can specify a pyproject.toml
as follows:
[project]
dependencies = [
"pandas==2.2.2",
"fastapi==0.116.1"
]
This file will be used universally in all flows (
through @pypi_base
)
and deployments without you having to specify anything else manually.
This is handy if you want to ensure that all
components of the project use the exact same set of dependencies.
Assets
What sets ML/AI projects apart from traditional software engineering is that they rely not only on code, but also on data and models.

A key difference between code and data/models is that in real-world systems, data and models are often updated continuously and automatically - new data streams in, and models are retrained constantly, whereas updating code is a much more manual process (even when the code is authored with AI co-pilots).
Metaflow artifacts are a core building block for managing data and models. Outerbounds Projects extends this concept with data assets and model assets, which complement artifacts by adding an extra layer of metadata and tracking.
Think of assets as a superset of artifacts: they let you elevate select data and models to a special status, making them easy to track in the UI. In practice, this gives you a model registry and data lineage tracking, seamlessly integrated with your projects. Read more about assets on a dedicated page.
Next, let's take a look at an example project that shows how all these pieces fit together.