Skip to main content

Testing a flow with pytest

Question

How can I use PyTest with a flow?

Solution

There are two related cases to consider:

  • Test the logic within steps.
  • Test the flow itself.

1Testing Logic in Steps

It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.

Here is a pseudo-code example of a flow you may want to refactor in this way.

class MyFlow(FlowSpec):

@step
def start(self):
# logic A
# logic B
# logic C
self.next(self.next_step)

# rest of flow
...

To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:

def do_logic():
# logic A
# logic B
# logic C

This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.

class MyFlow(FlowSpec):

@step
def start(self):
from my_module import do_logic
do_logic()
self.next(self.next_step)

# rest of flow

Separating the implementation of the logic from the flow makes code leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.

2Testing a Flow

In the second case, suppose you have a flow you would like to write a unit test for.

In this example there is a data artifact x which is stored in self.x.

simple_flow.py
from metaflow import FlowSpec, step

class FlowToTest(FlowSpec):

@step
def start(self):
self.x = 0
self.next(self.end)

@step
def end(self):
self.x += 1

if __name__ == '__main__':
FlowToTest()

Suppose you want to test that after running the flow the artifact value is what you expect.

assert x == 1 # goal: check this is true using PyTest

To do this you can:

  • Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
  • Define a test file and use PyTest to test the flow.

2.a(Optional) Switch Metaflow Profiles

By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json. You can make and activate a custom profile that tells Metaflow to use different metadata and data stores. For example, you can define to ~/.metaflowconfig/config_test.json like:

{
"METAFLOW_DEFAULT_DATASTORE": "local"
}

to separate data from test runs from your actual runs. See this guide for more details.

2.bRun PyTest Script

Now you can define a PyTest script that will:

  • Run the flow.
  • Use Metaflow's Runner API to access the artifact of interest.
  • Test the artifact value is as expected.
test_simple_flow.py
from metaflow import Runner

def test_flow():
runner = Runner(flow_file="./simple_flow.py", profile="test")
result = runner.run()
run_obj = result.run
assert run_obj.data.x == 1
pytest
    ============================= test session starts ==============================
platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
plugins: anyio-4.4.0
collected 1 item

test_simple_flow.py . [100%]

============================== 1 passed in 2.01s ===============================