Skip to main content

Project Assets

info

Assets are a new feature in Outerbounds. Don't hesitate to contact your support Slack with feedback and questions!

Metaflow artifacts are a core building block for managing data and models. Outerbounds Projects extends this concept with data assets and model assets, which complement artifacts by adding an extra layer of metadata, tracking, and observability.

What are assets

Consider assets as the core interfaces of your projects - your key inputs, outputs, and pluggable components. Unlike code, which is versioned through systems like Git and rolled out with CI/CD, assets often evolve automatically. For example, data assets can refresh continuously via ETL pipelines, while models can be retrained and finetuned on a regular cadence through automated training workflows.

The asset tracking in Outerbounds helps answer three key questions:

  1. What are the core assets consumed and produced by the project?

  2. Which project components - flows and deployments - are responsible for producing and consuming each asset?

  3. When was the asset last refreshed, and what are the key metrics for its latest version?

These questions apply equally to models and data. The questions are also relevant both for traditional ML and bleeding-edge AI projects.

In the latter case, you may not retrain models continuously (though ongoing fine-tuning is certainly possible) but you are likely to experiment with different LLMs and upgrade them periodically. Crucially, assets are scoped to a project branch, allowing you to evaluate models and datasets in isolation across branches and compare their performance.

Defining an asset

Every asset is defined through a configuration file, asset_config.toml, placed in a subdirectory under model and data in your project structure.

For instance, you could define a fraud detection model, trained with financial transaction data, and a churn model trained with product_events as follows:

model/fraud/asset_config.toml
model/churn/asset_config.toml

data/transactions/asset_config.toml
data/product_events/asset_config.toml

A configuration field has a few mandatory fields, as shown by the XKCD project example:

name = "Latest XKCD comic"
id = 'xkcd'
description = "Latest xkcd comic strip image"

[properties]
key = "value"
test = "another"
  • name is a human-readable name of the asset.

  • id is an unambiguous ID used to refer to the asset.

  • description is shown in the UI.

    The resulting asset listing will look like this:

Optionally, you may assign arbitrary key-value pairs in the asset under [properties]. This can be handy e.g. when working with models (LLMs) accessed through external inference providers, each of which has their own ID for the model:

name = "Small LLama"
id = "small_llama"
description = "A small LLM, currently llama3.1 8B"

[properties]
bedrock = "us.meta.llama3-1-8b-instruct-v1:0"
nebius = "meta-llama/Meta-Llama-3.1-8B-Instruct-fast"
togetherai = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
outerbounds = "meta-llama/Llama-3.1-8B-Instruct"

You can access the properties programmatically through the Assets API. Asset definitions are updated automatically every time you push an update to the project through CI/CD.

Updating an asset instance

Think asset definitions as containers for asset instances. Every time an asset updates, a new, versioned asset instance is created. It is possible to have asset with no instances - just metadata - like references to external models as shown above, but in most cases you want to populate an asset programmatically.

Assets are typically updated in a flow, for instance, in an ETL workflow or a model retraining pipeline. The easiest way is to register an artifact, like img_url below, as an asset - as shown in this snippet from XKCDData:

self.latest_id, self.img_url = fetch_latest()
self.prj.register_data("xkcd", "img_url")
Assets are references

Assets are not used to store the data or model itself. Rather, they store a reference to the actual entity, such as a data artifact or an external model endpoint.

In ob-project-starter, the latest comic strip is a core entity being processed, so it makes sense to elevate the corresponding artifact as an asset. This allows you to observe the asset conveniently in the asset view:

The visualization shown in the asset view is a normal Metaflow @card, produced by the task registering an asset instance with register_data. Customize the card to show metrics that matter for the asset instance, for instance, data or model quality metrics.

Importantly, the asset UI contains a pointer to the exact task that produced each asset instance (by calling register_data), allowing you to track data lineage from producers to consumers.

Consuming assets

Using an asset is straightforward. In a task, simply call get_data, as exemplified by this line from XKCDExplainer:

self.img_url = self.prj.get_data("xkcd")

The get_data call will fetch the latest instance of an asset and automatically resolve the reference it contains to the corresponding data item - like img_url in this case.

Importantly, get_data registers the task as a consumer of the asset, contributing to data lineage tracking.