Skip to main content

Project utilities API

Prerequisites

Start with Project Assets basics before diving into the API reference.

ProjectFlow

Base class for all project flows. Inherit from ProjectFlow instead of FlowSpec:

from obproject import ProjectFlow

class MyFlow(ProjectFlow):
@step
def start(self):
self.prj.register_data("dataset", "data")
self.next(self.end)

Configuration

ProjectFlow reads configuration from two files:

obproject.toml - Project identity and settings:

project = "fraud-detection"

[dev-assets]
branch = "main" # Read assets from main branch during local dev

[dependencies]
include_pyproject_toml = true # Auto-apply pyproject.toml deps (default: true)
SectionKeyDefaultDescription
[dev-assets]branch-Branch to read assets from during local development
[dependencies]include_pyproject_tomltrueAuto-apply @pypi_base from pyproject.toml

pyproject.toml - Python dependencies applied via @pypi_base:

[project]
dependencies = [
"scikit-learn>=1.3.0",
"pandas>=2.0.0",
]

prj Property

self.prj returns a ProjectContext with access to all project utilities. Initialized lazily on first access.

Attributes:

  • prj.project - Project name from config
  • prj.branch - Current write branch (from Metaflow @project)
  • prj.read_branch - Branch for reading assets (may differ during local dev)
  • prj.write_branch - Branch for writing assets
  • prj.asset - Low-level Asset client
  • prj.evals - Evaluation logger

Asset Registration

prj.register_data()

prj.register_data(name, artifact, annotations=None, tags=None, description=None)

Register a Metaflow artifact as a data asset.

ParameterTypeDescription
namestrAsset name (e.g., "user_transactions")
artifactstrArtifact name (must exist as self.<artifact>)
annotationsdictMetadata key-value pairs (values converted to strings)
tagsdictTags for categorization
descriptionstrHuman-readable description
self.features = compute_features(data)
self.prj.register_data("fraud_features", "features",
annotations={"n_samples": len(self.features)})

prj.register_external_data()

prj.register_external_data(name, blobs, kind, annotations=None, tags=None, description=None)

Register external data (S3, databases, etc.) as a data asset.

ParameterTypeDescription
namestrAsset name
blobslistURIs/references (e.g., ["s3://bucket/file.csv"])
kindstrData type (e.g., "s3", "database")
annotationsdictMetadata
tagsdictTags
descriptionstrDescription
self.prj.register_external_data("raw_logs",
blobs=["s3://data-lake/logs/2025-01-01/"],
kind="s3",
annotations={"size_gb": 450})

prj.register_model()

prj.register_model(name, artifact, annotations=None, tags=None, description=None)

Register a Metaflow artifact as a model asset.

ParameterTypeDescription
namestrAsset name (e.g., "fraud_classifier")
artifactstrArtifact name containing the model
annotationsdictModel metadata (accuracy, hyperparameters, etc.)
tagsdictTags (framework, algorithm, etc.)
descriptionstrDescription
self.model = RandomForestClassifier().fit(X, y)
self.prj.register_model("fraud_classifier", "model",
annotations={"accuracy": 0.95, "algorithm": "RandomForest"})

prj.register_external_model()

prj.register_external_model(name, blobs, kind, annotations=None, tags=None, description=None)

Register an external model (HuggingFace, checkpoints, etc.) as a model asset.

ParameterTypeDescription
namestrAsset name
blobslistURIs/references
kindstrModel type (e.g., "checkpoint", "huggingface")
annotationsdictMetadata
tagsdictTags
descriptionstrDescription
self.prj.register_external_model("base_llm",
blobs=["meta-llama/Llama-3.1-8B-Instruct"],
kind="huggingface",
annotations={"context_length": 8192})

Asset Consumption

prj.get_data()

prj.get_data(name, instance="latest")

Retrieve artifact data from a data asset registered with register_data().

ParameterTypeDescription
namestrAsset name
instancestrVersion: "latest", "latest-N", or "vN"

Returns: The artifact data

features = self.prj.get_data("fraud_features")
previous = self.prj.get_data("fraud_features", instance="latest-1")
note

Only works for artifact-based assets. For external data, use prj.asset.consume_data_asset().

prj.get_model()

prj.get_model(name, instance="latest")

Retrieve artifact data from a model asset registered with register_model().

ParameterTypeDescription
namestrAsset name
instancestrVersion: "latest", "latest-N", or "vN"

Returns: The model artifact data

model = self.prj.get_model("fraud_classifier")
previous_model = self.prj.get_model("fraud_classifier", instance="latest-1")
note

Only works for artifact-based models. For external models (checkpoints, HuggingFace, etc.), use prj.asset.consume_model_asset() and load from the returned blobs.

prj.asset.consume_data_asset()

prj.asset.consume_data_asset(name, instance="latest")

Low-level method returning the full asset reference.

Returns: Asset reference dict:

{
"id": "v123",
"created_by": {"entity_id": "FlowName/run_id/step/task"},
"data_properties": {
"data_kind": "artifact",
"annotations": {"key": "value"},
"blobs": []
}
}

prj.asset.consume_model_asset()

prj.asset.consume_model_asset(name, instance="latest")

Low-level method for consuming model assets.

Returns: Asset reference dict with model_properties instead of data_properties.

ref = self.prj.asset.consume_model_asset("fraud_classifier")
accuracy = float(ref["model_properties"]["annotations"]["accuracy"])

prj.asset.list_data_assets()

prj.asset.list_data_assets(tags=None)

List data assets in current project/branch.

ParameterTypeDescription
tagsdictFilter by tags (client-side filtering)

Returns: {"data": [...]}

prj.asset.list_model_assets()

prj.asset.list_model_assets(tags=None)

List model assets in current project/branch.

Returns: {"models": [...]}

danger

Tag filtering is client-side only. All assets are fetched, then filtered locally.


Standalone Asset Usage

Use Asset directly outside flow context (deployments, notebooks, scripts):

from obproject.assets import Asset

asset = Asset(
project="fraud-detection",
branch="main",
read_only=True # Required outside flow context
)

ref = asset.consume_model_asset("fraud_classifier")
ParameterTypeDescription
projectstrProject name
branchstrBranch name
read_onlyboolSet True outside flows (skips entity tracking)

When read_only=True:

  • Registration methods are no-ops
  • Consume methods use GET (no lineage tracking) instead of PUT

Scheduling & Triggering

Metaflow flows can be started by a time-based schedule or by an event published from another flow. The decorators below are project-aware wrappers around Metaflow's native @schedule and @trigger — they automatically scope to the correct project and branch so each deployed branch operates independently.

@project_schedule

Apply different schedules depending on which branch the flow is deployed to. If the branch doesn't match any pattern, no schedule is applied (the decorator is a no-op).

This wraps Metaflow's @schedule with branch-aware routing: production can run on a tight cron while staging runs daily and feature branches get no schedule at all.

from obproject import ProjectFlow, project_schedule

@project_schedule({
"main": {"cron": "0 8 * * 1-5", "timezone": "America/New_York"},
"develop": {"daily": True},
"release/*": {"hourly": True},
})
class MyFlow(ProjectFlow):
@step
def start(self):
self.next(self.end)
ParameterTypeDescription
schedule_mapdictMaps branch glob patterns to schedule specs

Each schedule spec is a dict with keys matching Metaflow's @schedule parameters:

KeyTypeDescription
cronstrCron expression (e.g., "0 8 * * 1-5")
dailyboolRun daily (default if empty spec {} is given)
weeklyboolRun weekly
hourlyboolRun hourly
timezonestrIANA timezone (e.g., "America/New_York")

Behavior:

  • Patterns are matched using fnmatch glob syntax (e.g., release/* matches release/v2)
  • First matching pattern wins (dict insertion order), so place more specific patterns first
  • If no pattern matches the deployed branch, no schedule is created
  • Cannot be combined with an explicit @schedule decorator on the same flow

@project_trigger

Subscribe a flow to project events published by other flows via prj.publish_event():

from obproject import ProjectFlow, project_trigger

@project_trigger(event="model_trained")
class EvaluationFlow(ProjectFlow):
@step
def start(self):
# Triggered when "model_trained" event is published
self.next(self.end)

The decorator resolves the full event name (prj.{project}.{branch}.{event}) from project config, so triggers are automatically scoped to the same branch.


Event Publishing

prj.publish_event()

prj.publish_event(name, payload=None)

Publish an event to trigger flows decorated with @project_trigger.

ParameterTypeDescription
namestrEvent name (must match the event= in a @project_trigger)
payloaddictJSON-serializable payload

Events are namespaced as prj.{project}.{branch}.{name}, so events published on one branch only trigger flows deployed on the same branch.

self.prj.publish_event("model_trained", payload={"accuracy": 0.95})

prj.safe_publish_event()

prj.safe_publish_event(name, payload=None)

Same as publish_event() but failures don't raise exceptions.


Asset Promotion

promote_assets()

from obproject.assets import promote_assets

promote_assets(project, source, target, alias="candidate")

Promote assets from one branch to another by copying metadata pointers (the underlying data is not duplicated). Each promoted instance gets an alias on the target branch for stable referencing.

ParameterTypeDefaultDescription
projectstrProject name
sourcestrSource branch name
targetstrTarget branch name
kindslist["data", "models"]Asset types to promote
assetstrNoneSpecific asset name, or all if omitted
instancestr"latest"Instance to promote ("latest", ID, or "@alias")
aliasstr"candidate"Alias to set on the promoted instance. Must be in the allowed list. Set to None to skip.
with_aliasesboolFalseCopy existing aliases from source branch

Returns: {"promoted": [...], "errors": [...]}

Promotion aliases

Promoted instances are tagged with aliases that represent lifecycle stages:

AliasMeaningTypical setter
@candidatePromoted from a branch, ready for evaluationpromote_assets() (default)
@validatedPassed quality gatesEvaluation flow
@productionActively consumed by downstream flows/appsApproval step
# Feature branch merges — model arrives on main as @candidate
promote_assets('my_project', source='feature-v2', target='main')

# Evaluation flow passes — re-alias to @validated
promote_assets('my_project', source='main', target='main',
asset='classifier', instance='@candidate',
alias='validated')

# Manual approval — promote to @production
promote_assets('my_project', source='main', target='main',
asset='classifier', instance='@validated',
alias='production')

Downstream consumers can then read a specific stage:

model = self.prj.get_model("classifier", instance="@production")

To customize the allowed aliases, add to obproject.toml:

[promotion]
aliases = ["candidate", "validated", "production"] # default

Promote on merge (CI pattern)

Add a promote job to your GitHub Actions workflow that runs before teardown when a PR is merged:

promote:
if: >
github.event_name == 'pull_request' &&
github.event.action == 'closed' &&
github.event.pull_request.merged == true
steps:
# ... setup steps ...
- name: Promote assets to main
run: |
BRANCH=${{ github.head_ref }}
PROJECT=$(yq .project obproject.toml)
python -c "
from obproject.assets import promote_assets
result = promote_assets('$PROJECT', source='$BRANCH', target='main')
for p in result['promoted']:
print(f\"Promoted {p['kind']}/{p['name']} with @{p.get('alias', 'candidate')}\")
"

teardown:
needs: promote
# ... existing teardown job ...

This ensures assets are promoted to main with @candidate before the feature branch is torn down.

tip

[dev-assets] and promotion pipelines: [dev-assets] branch = "main" redirects all asset reads to main, which is ideal for consumer flows (dashboards, reports). But in a promotion pipeline where a flow trains a model and then evaluates it on the same branch, reads need to come from the branch that just wrote the asset. Either omit [dev-assets] in promotion projects, or use a try/except fallback to read from the write branch when the asset doesn't exist on main yet.


Evaluation Logging

prj.evals.log()

prj.evals.log(message)

Log structured evaluation data with project/branch/run metadata.

ParameterTypeDescription
messagedict or strEvaluation data
self.prj.evals.log({
"model": "fraud_classifier",
"accuracy": 0.95,
"test_samples": 1000
})

Output includes a magic prefix for monitoring system ingestion.


obproject-deploy

The obproject-deploy CLI deploys flows, apps, and assets from a project directory. It is distributed via pip install ob-project-utils.

CLI flags

obproject-deploy [--project NAME] [--all] [--skip-apps] [--skip-flows] [--skip-assets]
FlagDescription
--project NAMEDeploy only the specified project from obproject_multi.toml
--allDeploy all projects in obproject_multi.toml (default if no --project)
--skip-appsSkip all app/endpoint deployments
--skip-flowsSkip all flow deployments
--skip-assetsSkip all asset registration

obproject_deploy.toml

Place an obproject_deploy.toml file in any deployments/<app>/ or flows/<flow>/ directory to control which branches deploy that component:

[deploy]
branches = ["main", "release/*"]
KeyTypeDefaultDescription
brancheslist[str]Deploy on all branchesGlob patterns for allowed branches

Behavior:

  • When no obproject_deploy.toml exists, the component deploys on all branches (backward compatible)
  • On non-main branches, an info message suggests adding the file
  • When the current branch doesn't match any pattern, the component is skipped:
⏭️  Skipping app 'my-dashboard' (branch 'feature_foo' not in ['main', 'release/*'])
tip

Add obproject_deploy.toml with branches = ["main"] to each app in deployments/ to prevent app proliferation on feature branches. See Project lifecycle for the full guide.


outerbounds flowproject

The outerbounds flowproject subcommands manage deployed project resources — workflow templates, assets, apps, and metadata. These are the same primitives that obproject-deploy creates during CI/CD.

info

These commands require a configured Metaflow profile with access to the Outerbounds API. They read credentials from your ~/.metaflowconfig directory.

Common options

All outerbounds flowproject subcommands accept:

OptionDefaultDescription
-d, --config-dir~/.metaflowconfigPath to Metaflow configuration directory
-p, --profile$METAFLOW_PROFILENamed Metaflow profile to use

Identifying a project branch

Several commands require --id in the format project/branch:

outerbounds flowproject list-templates --id my_project/main
outerbounds flowproject teardown-branch --id my_project/feature-v2

Branch names are normalized to match how obproject-deploy stores them: - and / characters are replaced with _, and the result is lowercased. So --id my_project/feature-v2 resolves to branch feature_v2.


get-metadata

Fetch the latest flowproject metadata for a project/branch.

outerbounds flowproject get-metadata --id <project/branch>

Returns the JSON metadata document that obproject-deploy registered, including workflow definitions, asset references, and app configurations.

# View metadata for production branch
outerbounds flowproject get-metadata --id fraud_detection/main

# Pretty-print with jq
outerbounds flowproject get-metadata --id fraud_detection/main | jq .

set-metadata

Register or update flowproject metadata for a project/branch.

outerbounds flowproject set-metadata '<json_string>'
ArgumentDescription
json_strJSON string containing the flowproject metadata payload
outerbounds flowproject set-metadata '{"project": "fraud_detection", "branch": "main", "workflows": [...]}'
danger

This is a low-level command used by deployment tooling. Prefer obproject-deploy for standard deployments.


list-templates

List Argo workflow templates deployed for a project/branch.

outerbounds flowproject list-templates --id <project/branch> [-o json]
OptionDescription
--idRequired. project/branch identifier
-o, --outputOutput format: json or human-readable (default)

Templates are discovered by querying Argo directly and matching on metaflow/project_name and metaflow/branch_name annotations.

# Human-readable output
outerbounds flowproject list-templates --id fraud_detection/main

# Machine-readable
outerbounds flowproject list-templates --id fraud_detection/main -o json
# → {"templates": ["frauddetection.prod.trainflow", "frauddetection.prod.scoreflow"]}

delete-metadata

Delete all flowproject metadata for a project/branch.

outerbounds flowproject delete-metadata --id <project/branch> [--yes]
OptionDescription
--idRequired. project/branch identifier
--yesSkip confirmation prompt
-o, --outputOutput format: json or human-readable (default)
outerbounds flowproject delete-metadata --id fraud_detection/feature-v2 --yes
caution

This removes the metadata record only. It does not delete workflow templates, assets, or apps. Use teardown-branch to remove all resources.


teardown-branch

Delete all deployed resources for a project/branch in a single operation.

outerbounds flowproject teardown-branch --id <project/branch> [--dry-run] [--yes] [-o json]
OptionDescription
--idRequired. project/branch identifier
--dry-runDiscover and list resources without deleting anything
--yesSkip confirmation prompt
-o, --outputOutput format: json or human-readable (default)

Teardown discovers and deletes these resource types in order:

  1. Workflow templates — Argo templates matching the project/branch annotations. Deleting a template cascades to its associated CronWorkflows and Sensors.
  2. Data assets — As listed in the flowproject metadata.
  3. Model assets — As listed in the flowproject metadata.
  4. Apps — Capsules tagged with the project and branch.
  5. Flowproject metadata — The metadata record itself.
# Preview what would be deleted
outerbounds flowproject teardown-branch --id fraud_detection/feature-v2 --dry-run

# Execute teardown
outerbounds flowproject teardown-branch --id fraud_detection/feature-v2 --yes

# JSON output for scripting
outerbounds flowproject teardown-branch --id fraud_detection/feature-v2 --yes -o json

See Also