Running steps across clouds
With Outerbounds, you can choose to execute steps of a flow across multiple cloud providers like AWS, Azure, and GCP. Multi-cloud compute is a powerful feature that helps you
- overcome constraints related to resource availability and services offered,
 - provide access to specialized compute offerings, such as Trainium on AWS or TPUs on GCP,
 - optimize cost by allowing you to move compute easily to the most cost-efficient environment,
 - respect data locality by moving compute to data.
 
For example, if your primary compute cluster is hosted on AWS and you would like to execute parts of your flow
on a compute pool on Azure, just add node-selector=outerbounds.co/provider=azure to the @kubernetes decorator
for the step that should be executed on Azure.
To set up new compute pools across clouds, contact your support Slack.
Example: Scaling out to Azure
Save this flow in crosscloudflow.py:
from metaflow import FlowSpec, step, resources, kubernetes
import urllib
class CrossCloudFlow(FlowSpec):
    @step
    @kubernetes
    def start(self):
        req = urllib.request.Request('https://raw.githubusercontent.com/dominictarr/random-name/master/first-names.txt')
        with urllib.request.urlopen(req) as response:
            data = response.read()
        i = 0
        self.titles = data[:10]
        self.next(self.process, foreach='titles')
    @resources(cpu=1,memory=512)
    @kubernetes(node_selector="outerbounds.co/provider=azure")
    @step
    def process(self):
        self.title = '%s processed' % self.input
        self.next(self.join)
    @step
    def join(self, inputs):
        self.results = [input.title for input in inputs]
        self.next(self.end)
    @step
    def end(self):
        print('\n'.join(self.results))
if __name__ == '__main__':
    CrossCloudFlow()
Here, node_selector is used to target an Azure-based compute pool. The flow illustrates a common pattern in cross-cloud processing:
- First, we retrieve a dataset in the primary cloud (the 
startstep). - Processing of the dataset is scaled out to another cloud (the 
processstep). - Results are retrieved back to the primary cloud (the 
joinstep). 
Run the flow as usual:
python crosscloudflow.py run --with kubernetes
Open the Status view to observe the load between compute pools in real-time.