Handle Tasks that may Fail
Question
How do I design steps to handle potential task failures at runtime?
Solution
Metaflow has two decorators that address this.
1Using @retry and @catch
You can use Metaflow's @retry decorator before step definitions. The @retry decorator takes an argument called times which takes a number in [0,4]. This is intended to handle transient failures and is particularly useful when running tasks on the cloud where machine failures are more common.
You can also use this in the command line like python flow.py run --with retry. By default this will retry failed steps with no @retry decorator defined three times.
Similarly, the @catch decorator will catch exceptions raised in the task. However @catch is intended for use cases where you want to continue the flow after any exception. Catch contains an optional argument var which you can save as a flow artifact if you want to later access the exception.
when using @catch you should design the steps in your flow after the @catch to tolerate exceptions in that step.
2Run Flow
This flow shows how to:
- Create a
foreachbranch instartthat creates threedividetasks. - Using
@retryto rerundividewhen the step code produces an exception. - Saving the exception using
@catch.- In the
jointask, use the saved exception to only store results if thedivideparent task succeeded.
- In the
from metaflow import FlowSpec, step, retry, catch
class CatchRetryFlow(FlowSpec):
@step
def start(self):
self.divisors = [0, 1, 2]
self.next(self.divide, foreach='divisors')
@catch(var='divide_fail')
@retry(times=1)
@step
def divide(self):
self.res = 10 / self.input
self.next(self.join)
@step
def join(self, inputs):
self.results = [i.res
for i in inputs
if not i.divide_fail]
print('results', self.results)
self.next(self.end)
@step
def end(self):
print('done!')
if __name__ == '__main__':
CatchRetryFlow()
python handle_failed_task.py run
Further Reading
- Debugging flows with
resume - Dealing with failures in Metaflow
- More examples in the Effective Data Science Infrastructure book