Testing a flow with pytest
Question
How can I use PyTest with a flow?
Solution
There are two related cases to consider:
- Test the logic within steps.
- Test the flow itself.
1Testing Logic in Steps
It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.
Here is a pseudo-code example of a flow you may want to refactor in this way.
class MyFlow(FlowSpec):
@step
def start(self):
# logic A
# logic B
# logic C
self.next(self.next_step)
# rest of flow
...
To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:
def do_logic():
# logic A
# logic B
# logic C
This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.
class MyFlow(FlowSpec):
@step
def start(self):
from my_module import do_logic
do_logic()
self.next(self.next_step)
# rest of flow
Separating the implementation of the logic from the flow makes code leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.
2Testing a Flow
In the second case, suppose you have a flow you would like to write a unit test for.
In this example there is a data artifact x
which is stored in self.x
.
from metaflow import FlowSpec, step
class FlowToTest(FlowSpec):
@step
def start(self):
self.x = 0
self.next(self.end)
@step
def end(self):
self.x += 1
if __name__ == '__main__':
FlowToTest()
Suppose you want to test that after running the flow the artifact value is what you expect.
assert x == 1 # goal: check this is true using PyTest
To do this you can:
- Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
- Define a test file and use PyTest to test the flow.
2.a(Optional) Switch Metaflow Profiles
By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json
. You can make and activate a custom profile that tells Metaflow to use different metadata and data stores. For example, you can define to ~/.metaflowconfig/config_test.json
like:
{
"METAFLOW_DEFAULT_DATASTORE": "local"
}
to separate data from test runs from your actual runs. See this guide for more details.
2.bRun PyTest Script
Now you can define a PyTest script that will:
- Run the flow.
- Use Metaflow's Runner API to access the artifact of interest.
- Test the artifact value is as expected.
from metaflow import Runner
def test_flow():
runner = Runner(flow_file="./simple_flow.py", profile="test")
result = runner.run()
run_obj = result.run
assert run_obj.data.x == 1
pytest