Project Assets
Assets are a new feature in Outerbounds. Don't hesitate to contact your support Slack with feedback and questions!
Metaflow artifacts are a core building block for managing data and models. Outerbounds Projects extends this concept with data assets and model assets, which complement artifacts by adding an extra layer of metadata, tracking, and observability.
What are assets
Consider assets as the core interfaces of your projects - your key inputs, outputs, and pluggable components. Unlike code, which is versioned through systems like Git and rolled out with CI/CD, assets often evolve automatically. For example, data assets can refresh continuously via ETL pipelines, while models can be retrained and finetuned on a regular cadence through automated training workflows.
The asset tracking in Outerbounds helps answer three key questions:
What are the core assets consumed and produced by the project?
Which project components - flows and deployments - are responsible for producing and consuming each asset?
When was the asset last refreshed, and what are the key metrics for its latest version?
These questions apply equally to models and data. The questions are also relevant both for traditional ML and bleeding-edge AI projects.
In the latter case, you may not retrain models continuously (though ongoing fine-tuning is certainly possible) but you are likely to experiment with different LLMs and upgrade them periodically. Crucially, assets are scoped to a project branch, allowing you to evaluate models and datasets in isolation across branches and compare their performance.
Defining an asset
Every asset is defined through a configuration file, asset_config.toml
, placed
in a subdirectory under model
and data
in your project
structure.
For instance, you could define a fraud
detection model, trained with financial
transaction
data, and a churn
model trained with product_events
as follows:
model/fraud/asset_config.toml
model/churn/asset_config.toml
data/transactions/asset_config.toml
data/product_events/asset_config.toml
A configuration field has a few mandatory fields, as shown by the XKCD project example:
name = "Latest XKCD comic"
id = 'xkcd'
description = "Latest xkcd comic strip image"
[properties]
key = "value"
test = "another"
name
is a human-readable name of the asset.id
is an unambiguous ID used to refer to the asset.description
is shown in the UI.The resulting asset listing will look like this:
Optionally, you may assign arbitrary key-value pairs in the asset
under [properties]
. This can be handy e.g. when working with models
(LLMs) accessed through external inference providers, each of which
has their own ID for the model:
name = "Small LLama"
id = "small_llama"
description = "A small LLM, currently llama3.1 8B"
[properties]
bedrock = "us.meta.llama3-1-8b-instruct-v1:0"
nebius = "meta-llama/Meta-Llama-3.1-8B-Instruct-fast"
togetherai = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
outerbounds = "meta-llama/Llama-3.1-8B-Instruct"
You can access the properties programmatically through the Assets API. Asset definitions are updated automatically every time you push an update to the project through CI/CD.
Updating an asset instance
Think asset definitions as containers for asset instances. Every time an asset updates, a new, versioned asset instance is created. It is possible to have asset with no instances - just metadata - like references to external models as shown above, but in most cases you want to populate an asset programmatically.
Assets are typically updated in a flow, for instance, in an ETL
workflow or a model retraining pipeline. The easiest way is to
register an artifact, like img_url
below, as an asset - as shown
in this snippet from
XKCDData
:
self.latest_id, self.img_url = fetch_latest()
self.prj.register_data("xkcd", "img_url")
Assets are not used to store the data or model itself. Rather, they store a reference to the actual entity, such as a data artifact or an external model endpoint.
In ob-project-starter
, the latest comic strip is a core entity
being processed, so it makes sense to elevate the corresponding artifact
as an asset. This allows you to observe the asset conveniently in the
asset view:
The visualization shown in the asset view is a normal Metaflow
@card
,
produced by the task registering an asset instance with register_data
.
Customize the card to show metrics that matter for the asset instance,
for instance, data or model quality metrics.
Importantly, the asset UI contains a pointer to the exact task that
produced each asset instance (by calling register_data
), allowing
you to track data lineage from producers to consumers.
Consuming assets
Using an asset is straightforward. In a task, simply call get_data
,
as exemplified by this line from XKCDExplainer
:
self.img_url = self.prj.get_data("xkcd")
The get_data
call will fetch the latest instance of an asset and
automatically resolve the reference it contains to the corresponding
data item - like img_url
in this case.
Importantly, get_data
registers the task as a consumer of the asset,
contributing to data lineage tracking.