Load Local Data with IncludeFile
Question
How do I load data from a local directory structure on AWS Batch using Metaflow's IncludeFile
?
Solution
When using Metaflow's @batch
decorator as a compute environment for a step, there are several options for accessing data. This page will show how to use metaflow.IncludeFile
to access a file on AWS Batch or Kubernetes.
1Acquire Data
The example will access this CSV file from a step the runs on AWS Batch in several ways including:
local_data.csv
1, 2, 3
4, 5, 6
2Run Flow
This flow shows how to:
- Include flow artifacts with
self.little_data
. - Use artifacts to access the contents of a local file on AWS Batch.
local_data_on_batch_include.py
from metaflow import FlowSpec, step, IncludeFile, batch
class IncludeFileFlow(FlowSpec):
data = IncludeFile('data',
default='./local_data.csv')
@batch(cpu=1)
@step
def start(self):
print(self.data)
self.next(self.end)
@step
def end(self):
print('Finished reading the data!')
if __name__ == '__main__':
IncludeFileFlow()
python local_data_on_batch_include.py run