Organize Results with Namespaces
Question
How can we use namespaces to keep results of our team's flow runs organized and accessible, no matter who ran the flow?
Solution
Metaflow persists all runs and the data they produce. This data can be accessed using the Client API.
Namespaces are a mechanism to organize these results and the Client API data access patterns.
By default, the active namespace will be user:<name>
where name
is the user name of the person who ran the flow.
This flow will be used throughout the remainder of this post. The important part is in the end
step, where a random choice
is stored as an artifact resulting from a run of TeamCollabFlow
. This data could be any artifact you store
1Run Flow
from metaflow import FlowSpec, step, current
class TeamCollabFlow(FlowSpec):
@step
def start(self):
print("current.username: {}".format(current.username))
print("current.namespace: {}".format(current.namespace))
self.next(self.end)
@step
def end(self):
import random
self.choice = random.choice([1,2,3,4,5])
print("Random choice was {}".format(self.choice))
if __name__ == "__main__":
TeamCollabFlow()
python team_collab_flow.py run
2aAccess Results from your Namespace
By default, the Client API assumes it should pull data from the current user name. This means by default can use any Client API calls and you don't need to worry about others who are running this same flow and storing results in the same S3 bucket (or other storage location) as you. You will only get results from your namespace unless you explicitly set the namespace.
from metaflow import Flow
run = Flow('TeamCollabFlow').latest_successful_run
run_id, choice = run.id, run.data.choice
print("Run with id={} has choice={}".format(run_id,choice))
2bShare Results with Teammates
If you want to your teammate access a run ID from your namespace, they will need to first switch the namespace before making the corresponding Client API call. The following example shows an error that occurs after switching to a namespace that doesn't contain the run_id
that you ran in the previous section. This is what happens when your teammate tries to access your result before switching to your namespace, where the run_id
exists.
from metaflow import namespace, get_namespace, Flow
from metaflow.exception import MetaflowNamespaceMismatch
not_my_namespace = 'user:my-teammate'
namespace(not_my_namespace) # teammate's default namespace
flow_name = 'TeamCollabFlow'
try:
run = Flow(flow_name).latest_successful_run
except MetaflowNamespaceMismatch as m:
print(m)
print("\tNo {} results in the {} namespace".format(flow_name, get_namespace()))
Your teammate can use your namespace to access the result. The following snippet shows how you can get your namespace as using default_namespace
. This will return a string that you or any of your colleagues can pass to namespace
before fetching your flow results:
my_namespace = default_namespace()
namespace(my_namespace) # give the my_namespace string to your colleague
run = Flow(flow_name).latest_successful_run
run_id, choice = run.id, run.data.choice
print("Run with id={} has choice={}".format(run_id,choice))
You can use these any time to activate your default name space:
from metaflow import namespace, default_namespace
_ = namespace(default_namespace())
2cUse the Run ID to access in a Global Namespace
This example shows how to access results across all namespaces represented in your flow data storage location, regardless of the user. This is done by setting namespace(None)
and using the run.id
.
namespace(None)
run = Run('TeamCollabFlow/{}'.format(run_id))
print("Run with id={} has data={}".format(run_id,data))
3The Production Namespace
Metaflow also maintains a production
namespace that is separate from any user
namespace. This is used when you schedule production
flows to run automatically. In the case where a flow run is triggered via a production scheduler it may not make sense to associate the runs to a single user. You can read more about the production name space here.
How do I?
Use the Client API to manage deployment auth, resume production runs in a local namespace, and more?
4aAccessing Results in a Second Flow
This flow shows how to:
- Access data from another flow using the
get_flow_data
function- Use the
namespace
call to change active namespaces. - Access results from past runs of
other_flow_name
. - Use the
default_namespace
call to return to the original namespace.
- Use the
- Print the data from the other flow during the
AccessOtherNamespace
run.
from metaflow import (Flow, FlowSpec, step, namespace,
default_namespace, Parameter)
def get_flow_data(flow, new_ns, original_ns=default_namespace()):
try:
namespace(new_ns)
run = Flow(flow).latest_successful_run
except:
return
namespace(original_ns)
return run
class AccessOtherNamespace(FlowSpec):
other_flow_name = Parameter('other-flow-name',
default='TeamCollabFlow')
other_namespace = Parameter('other-namespace',
default=default_namespace())
msg = "{}.latest_successful_run.data. has value {}."
@step
def start(self):
# access other_flow_name in other_namespace
run = get_flow_data(
flow = self.other_flow_name,
new_ns = self.other_namespace
)
if run is None:
print("Flow {} not found in {} namespace.".format(
self.other_flow_name,
self.other_namespace
))
else:
print(self.msg.format(
self.other_flow_name,
run.data.choice,
))
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
AccessOtherNamespace()
4bRun the Second Flow
python access_namespace_in_flow.py run