Deploying to Google Cloud with Kubernetes
This page shows how to deploy a complete Metaflow stack powered by Kubernetes on Google Cloud. For more information about the deployment, see deployment details, advanced options and FAQ.
1. Preparation
Terraform Tooling
Terraform is a popular infrastructure-as-code tool for managing cloud resources. We have published a set of terraform templates here for setting up Metaflow on GCP. Terraform needs to be installed on your system in order to use these templates.
- Install terraform by following these instructions.
- Download Metaflow on GCP terraform templates:
git clone git@github.com:outerbounds/metaflow-tools.git
GCloud Command Line Interface
This is the official CLI tool ("gcloud") published by Google for working with GCP. It will be used by Terraform when applying our templates (e.g. for authentication vs GCP). Please install it by following these instructions.
kubectl Command Line Interface
kubectl is a standard CLI tool for working with Kubernetes clusters. It will be used by Terraform when applying our templates (e.g. for deploying some services to your Google Kubernetes Engine (GKE) cluster). Please install it by following these instructions.
2. Provision GCP Resources
See here for the exact set of resources to be provisioned. Also, note the permissions that are needed.
Enable Google Cloud APIs
You need to manually enable APIs used by the Metaflow stack on the Google Cloud console. Make sure that the following APIs are enabled:
- Cloud Resource Manager
- Compute Engine API
- Service Networking
- Cloud SQL Admin API
- Kubernetes Engine API
If you have used the account/project for other deployments in the past, it is possible that these APIs are already enabled. Also note that enabling these APIs automatically enables a bunch of other required APIs.
Login to GCP
You must be logged onto GCP as an account with sufficient permissions to provision the required resources. Use the GCloud CLI (gcloud
)
gcloud auth application-default login
Initialize your Terraform Workspace
From your metaflow-tools/gcp/terraform directory
, run:
terraform init
Set Terraform Variables
Create a FILE.tfvars
file with the following content (updating relevant values):
org_prefix = "<ORG_PREFIX>"
project = "<GCP_PROJECT_ID>"
For org_prefix
, choose a short and memorable alphanumeric string. It will be used for naming the Google Cloud Storage bucket, whose name must be globally unique across GCP.
For GCP_PROJECT_ID
, set the GCP project ID you wish to use.
You may rename FILE.tfvars
to a more friendly name appropriate for your project. E.g. metaflow.poc.tfvars
.
The variable assignments defined in this file will be passed to terraform
CLI.
Optional: Enable Argo Events
To enable event triggering for Metaflow, add the following line in FILE.tfvars
:
enable_argo=true
For more technical context, see this page about event triggering.
Optional: Enable Airflow
Optionally, you can include Apache Airflow as the production orchestrator for Metaflow in your deployment by including the following lines in FILE.tfvars
:
deploy_airflow=true
Setting deploy_airflow=true
will deploy Airflow in the GKE cluster with a LocalExecutor
.
Apply Terraform Template to Provision GCP Infrastructure
From your local metaflow-tools/gcp/terraform
directory, run:
terraform apply -target="module.infra" -var-file=FILE.tfvars
A plan of action will be printed to the terminal. You should review it before accepting. See details for what to expect.
Common Resource Provisioning Hiccups
Cloud SQL instance name conflicts
Cloud SQL instance (the "PostgreSQL DB") names must be unique within your GCP project - including instances that have been deleted within the last 7 days. It means that if you should want to reprovision the entire set of GCP resources within that time window, a fresh name must be chosen. In this scenario, please update the DB generation variable here.
3. Deploy Metaflow Services to GKE cluster
Apply Terraform Template to Deploy Services
From your local metaflow-tools/gcp/terraform
directory, run:
terraform apply -target="module.services" -var-file=FILE.tfvars
4. End User Setup Instructions
When the command above completes, it will print a set of setup instructions for Metaflow end users (folks who will be writing and running flows). These instructions are meant to get end users started on running flows quickly.
You can access the terraform instruction output at any time by running (from metaflow-tools/gcp/terraform directory
):
terraform output -raw END_USER_SETUP_INSTRUCTIONS
If the output is not available, run
terraform apply -var-file=FILE.tfvars
and try the output
command again.
Sample Output
Setup instructions for END USERS (e.g. someone running Flows vs the new stack):
-------------------------------------------------------------------------------
There are three steps:
1. Ensuring GCP access
2. Configure Metaflow
3. Run port forwards
4. Install necessary GCP Python SDK libraries
STEP 1: Ensure you have sufficient access to these GCP resources on your local workstation:
- Google Kubernetes Engine ("Kubernetes Engine Developer role")
- Google Cloud Storage ("Storage Object Admin" on bucket ob-metaflow-storage-bucket-ci)
Option 1: Login with gcloud CLI
Login as a sufficiently capabable user: $ gcloud auth application-default login.
Option 2: Use service account key
Ask for the pregenerated service account key (./metaflow_gsa_key_ci.json) from the administrator (the person who stood up the Metaflow stack).
Save the key file locally to your home directory. It should be made to be accessible only by you (chmod 700 <FILE>)
Configure your local Kubernetes context to point to the the right Kubernetes cluster:
$ gcloud container clusters get-credentials metaflow-kubernetes-ci --region=us-west2
STEP 2: Configure Metaflow:
Option 1: Create JSON config directly (recommended)
Create the file "~/.metaflowconfig/config.json" with this content. If this file already exists, keep a backup of it and
move it aside first.
{
"METAFLOW_DATASTORE_SYSROOT_GS": "gs://ob-metaflow-storage-bucket-ci/tf-full-stack-sysroot",
"METAFLOW_DEFAULT_DATASTORE": "gs",
"METAFLOW_DEFAULT_METADATA": "service",
"METAFLOW_KUBERNETES_NAMESPACE": "default",
"METAFLOW_KUBERNETES_SERVICE_ACCOUNT": "metaflow-service-account",
"METAFLOW_SERVICE_INTERNAL_URL": "http://metadata-service.default:8080/",
"METAFLOW_SERVICE_URL": "http://127.0.0.1:8080/"
}
Option 2: Interactive configuration
Run the following, one after another.
$ metaflow configure gs
$ metaflow configure kubernetes
Use these values when prompted:
METAFLOW_DATASTORE_SYSROOT_GS=gs://ob-metaflow-storage-bucket-ci/tf-full-stack-sysroot
METAFLOW_SERVICE_URL=http://127.0.0.1:8080/
METAFLOW_SERVICE_INTERNAL_URL=http://metadata-service.default:8080/
[For Argo only] METAFLOW_KUBERNETES_NAMESPACE=argo
[For Argo only] METAFLOW_KUBERNETES_SERVICE_ACCOUNT=argo
Note: you can skip these:
METAFLOW_SERVICE_AUTH_KEY
METAFLOW_KUBERNETES_CONTAINER_REGISTRY
METAFLOW_KUBERNETES_CONTAINER_IMAGE
STEP 3: Setup port-forwards to services running on Kubernetes:
option 1 - run kubectl's manually:
$ kubectl port-forward deployment/metadata-service 8080:8080
$ kubectl port-forward deployment/metaflow-ui-backend-service 8083:8083
$ kubectl port-forward deployment/metadata-service 3000:3000
$ kubectl port-forward -n argo deployment/argo-server 2746:2746
option 2 - this script manages the same port-forwards for you (and prevents timeouts)
$ python metaflow-tools/scripts/forward_metaflow_ports.py [--include-argo]
STEP 4: Install GCP Python SDK
$ pip install google-cloud-storage google-auth