Data Science and Machine Learning How-to Guides
As even seemingly simple ML projects can grow into a set of complex subtasks (such as those illustrated in the figure below), we are continuously building a library of answers to questions that many people face in their daily lives of building end-to-end ML applications.
Here you can find a growing collection of how-to guides that help you build real-life data science and machine learning applications using Metaflow.
Data
Local Data
- How to Load CSV Data in Metaflow Steps
- How to Load Local Data with IncludeFile
- How to Run SQL Query with Pandas
Cloud Data
- How to Chunk a Dataframe to Parquet
- How to Load Parquet Data from S3 to Arrow Table
- How to Load Parquet Data from S3 to Pandas DataFrame
- How to Share Local Data with S3
- How to Run SQL Query with AWS Athena
Core Concepts
Compute
Configuring Remote Instances
- How to Build a Custom Docker Image
- How to Package Files for Remote Compute
- How to Use a Custom Docker Image
Performance Acceleration
Orchestration
Flow Architecture
- How to Access Parent Directories from a Flow
- How to Define Lists as Parameters
- How to Use Artifacts in Metaflow Join Step
- How to Nest Foreach Flows
- How to Store Artifacts across Metaflow Steps
- How to Set Environment Variables with .env File
- How to Set Environment Variables with Metaflow Decorator
Iterative Flow Development
Core Concepts
Versioning
Versioned Flows and Artifacts
- How to Add and Remove Tags
- How to Download Metaflow Task Code Package
- How to Filter Flows on Condition
- How to List Flow Steps with Client API
- How to Pass XGBoost DMatrix Between Metaflow Steps
- How to Whether to Use a Flow's self Keyword
- How to Reuse Parameters Across Flows
Versioned Environments
Experiment Tracking
Core Concepts
Deployment
Alerting
Deploying Models
Deploying Flows
Testing
Modeling
Modeling Frameworks
- How to Use Keras with Metaflow
- How to Use PyTorch with Metaflow
- How to Use Scikit-learn Estimators with Metaflow
- How to Use XGBoost with Metaflow