Skip to main content

Getting started with Inference

💡 Want to skip the wait and get your hands dirty? Go to outerbounds/inference-examples Github repository to get started!

This doc will walk you through the inference functionality on the Outerbounds platform and some basic concepts to get started.

What is the Inference functionality

You can deploy long running services on the Outerbounds platform for many usecases. Some of these use cases include (but are not limited to):

  • Deploying a FastAPI app that uses a trained model stored as a Metaflow artifact to power inference queries on the model.
  • A streamlit dashboard for analytics or a human-in-the-loop process.
  • An optuna dashboard to monitor your HPO experiments.
  • A vLLM container to power inference on your custom finetuned or off-the-shelf LLM.

Outerbounds platform allows you to easily deploy and manage these long running services, otherwise known as Deployments.

Core Concepts

Deployment/App/Endpoint

A Deployment is a long running service deployed on the platform with one or more replicas. We use App, Endpoint and Deployment interchangeably to refer to the same thing.

Workers

Each deployment can have 0 or more replicas. Each replica is called a Worker. As a user of the platform, you control how many workers you'd like to provision for your deployment. Deployments support autoscaling for use cases that involve variable traffic patterns.

Compute Pools

Just like tasks and workstations, deployments (or to be precise -- each deployment worker) also run on compute pools. Just like workstations and tasks, you need to configure one or more compute pools so that they're allowed to run deployments.

Outerbounds CLI

The outerbounds CLI is the main way of provisioning and managing deployments on the platform. You can get started with the CLI by doing: pip install -U outerbounds.

Outerbounds UI

You can monitor the status, logs and metrics of your Deployments on the Outerbounds UI, under the Deployments tab under the Components header.