Use Artifacts in Metaflow Join Step
Question
How can I pass data artifacts of a Metaflow flow through a join step? What are my options for merging artifacts?
Solution
You can merge_artifacts
in the join step. There are additional Metaflow features that allow you to exclude
upstream artifacts during the merge. You will also want to be aware of any potential collisions with upstream artifact names.
This flow shows how to:
- Access upstream values after branches are joined.
- Select a value from a specific branch because there is a naming collision.
- Exclude an upstream value from the merge.
join_step_artifacts.py
from metaflow import FlowSpec, step
class JoinArtifacts(FlowSpec):
@step
def start(self):
self.pre_branch_data = 0
self.next(self.branch_a, self.branch_b)
@step
def branch_a(self):
self.x = 1 # define x
self.a = "a"
self.next(self.join)
@step
def branch_b(self):
self.x = 2 # define another x!
self.b = "b"
self.next(self.join)
@step
def join(self, inputs):
# pick which x to propagate
self.x = inputs.branch_a.x
self.merge_artifacts(inputs, exclude=["a"])
self.next(self.end)
@step
def end(self):
print("`pre_branch_data` " + \
f"value is: {self.pre_branch_data}.")
print(f"`x` value is: {self.x}.")
print(f"`b` value is: {self.b}.")
try:
print(f"`a` value is: {self.a}.")
except AttributeError as e:
print("`a` was excluded! \U0001F632")
if __name__ == "__main__":
JoinArtifacts()
python join_step_artifacts.py run
Further Reading
- Inspecting flows and results
- More examples using
@retry
and@catch
in the Effective Data Science Infrastructure book