Debugging Flows
In this episode, you will see how to use resume
in the command line when debugging your flows. After the episode, you will be able to debug and resume flows at arbitrary points in the DAG so you don’t need to run time-consuming steps over and over again. This same functionality works even when the steps are run on different computers. In fact, you can even resume a Metaflow run on your local machine for a flow that was run automatically on a production scheduler like AWS Step Functions or Argo.
1Common Resume Scenario
In this episode, we focus on using resume
in the command line when debugging your flows. A common scenario of using resume
might go something like this:
- You write
my_sweet_flow.py
- You run
python my_sweet_flow.py run
- Oh no, something broke! Analyzing stack trace...
- Found the bug!
- Save
my_sweet_flow.py
with the fix.
- You resume the flow from the step that produced the bug:
python my_sweet_flow.py resume
- Pick up the state of the last flow execution from the step that failed.
- Note: You can also specify a specific step to resume from like
python my_sweet_flow.py resume <DIFFERENT STEP NAME>
2Example
Let's look at an example. In this flow:
- The
time_consuming_step
mimics some process you'd rather not re-run because of a downstream error. Examples of such processes might be data transformations or model training. - The
error_prone_step
creates anException
that halts your flow.
from metaflow import FlowSpec, step
class DebuggableFlow(FlowSpec):
@step
def start(self):
self.next(self.time_consuming_step)
@step
def time_consuming_step(self):
import time
time.sleep(12)
self.next(self.error_prone_step)
@step
def error_prone_step(self):
raise Exception()
self.next(self.end)
@step
def end(self):
print("Flow is done!")
if __name__ == "__main__":
DebuggableFlow()
2aObserve a Failed Task
python debuggable_flow.py run
2bFix the Issue
You can resolve the issue by:
Finding and fixing the bug
In this case:
- raise Exception()
+ print("Squashed bug")
from metaflow import FlowSpec, step
class DebuggableFlow(FlowSpec):
@step
def start(self):
self.next(self.time_consuming_step)
@step
def time_consuming_step(self):
import time
time.sleep(12)
self.next(self.error_prone_step)
@step
def error_prone_step(self):
print("Squashed bug")
# raise Exception()
self.next(self.end)
@step
def end(self):
print("Flow is done!")
if __name__ == "__main__":
DebuggableFlow()
- Saving the flow script
2cResume the Flow
python debuggable_flow.py resume
Congratulations, you have completed the Introduction to Metaflow tutorial! Now you are ready to operationalize your machine learning workflows with Metaflow.
To keep progressing in your Metaflow journey you can:
- Get to know Outerbounds' view on the machine learning stack.
- Check out the open-source repository.
- Join our slack community and engage in #ask-metaflow. There is a lot of machine learning wisdom to discover from the community!