Skip to main content

Scalability FAQ

How much resources may I request?

The maximum available CPU, GPU, memory, and disk depend on the compute pools attached to the platform. Open the pools tab in Status to see the available pools and the resources they provide.

If you request more @resources that are currently made available in the cluster, you get an error message

Resource requirements exceeds max available on any node

In this case, lower the @resources requested or contact your support Slack to add more compute capacity in the platform.

What is the maximum number of items that can be processed with foreach?

You can define a foreach over any Python list. The list can potentially contain hundreds of thousands of items.

To guard against lists that contain too many items inadvertently, Metaflow defines a safeguard flag --max-num-splits which helps make sure that you don't launch thousands of tasks by accident. If you need a wide foreach, you can increase the value to any high number, e.g. --max-num-splits=10000.

Metaflow doesn't launch all tasks in a foreach for parallel execution automatically. Another flag, --max-workers, governs the number of tasks that are launched concurrently. Increasing the value of this flag speeds up processing by leveraging parallelism more actively but it also adds more load to the cluster, which you can observe in the Status view.