Use XGBoost with Metaflow
Question
How can I build and fit an XGBoost model in a Metaflow flow?
Solution
There are two common ways to fit XGBoost models and you can use both with Metaflow. In XGBoost documentation they are referred to as the learning API and the scikit-learn API. This example uses the learning API but you can build flows with either.
1Run Flow
The flow shows how to:
- Load training data.
- Instantiate the XGBoost model.
- Train the model with cross-validation.
xgb_learning_api.py
from metaflow import FlowSpec, step, Parameter
class XGBFlow(FlowSpec):
@step
def start(self):
from sklearn import datasets
self.iris = datasets.load_iris()
self.X = self.iris['data']
self.y = self.iris['target']
self.next(self.train_model)
@step
def train_model(self):
import xgboost as xgb
dtrain = xgb.DMatrix(self.X, self.y)
self.results = xgb.cv(
params = {'num_class':3,
'objective':'multi:softmax',
'eval_metric':"mlogloss"},
dtrain=dtrain,
verbose_eval=False
)
self.next(self.end)
@step
def end(self):
print("Flow is done.")
if __name__ == "__main__":
XGBFlow()
python xgb_learning_api.py run
2Access Artifacts Outside of Flow
The following can be run in a Python script or notebook to access the contents of the dataframe that was stored as a flow artifact with self.results
:
from metaflow import Flow
run = Flow('XGBFlow').latest_run
run.data.results.head()
train-mlogloss-mean | train-mlogloss-std | test-mlogloss-mean | test-mlogloss-std | |
---|---|---|---|---|
0 | 0.741877 | 0.001425 | 0.750814 | 0.002562 |
1 | 0.533298 | 0.003306 | 0.550585 | 0.001667 |
2 | 0.394987 | 0.002554 | 0.421669 | 0.002304 |
3 | 0.300281 | 0.002392 | 0.337402 | 0.003478 |
4 | 0.231565 | 0.001567 | 0.280347 | 0.004483 |