Natural Language Processing - Episode 5
This episode references the Python script nlpflow.py.
In the previous episode, you saw how we trained a model and compared it to a baseline. However, what if your model is worse than the baseline? Is there a way to manage this situation programmatically? An important Metaflow feature that can enable this is tagging. Tagging allows you to categorize and organize flows, which we can use to mark certain models as "production candidates.” At the end of this lesson, you will be able to:
- Collaborate on and organize flows with tagging.
- Implement common design patterns for testing machine learning models.
1What is Tagging?
Tags allow you to express opinions about the results of your and your colleagues' work, and, importantly, change those assessments at any time. In contrast to runs and artifacts that represent immutable facts (history shouldn't be rewritten), the way how you interpret those facts may change over time, which is reflected in tags. This makes tags ideal for managing which models are promoted to the next step in your modeling workflow.
You can add a tag to a flow with only a few lines of code. Below is a snippet of code we will use to add tags in our flow:
from metaflow import Flow, current
run = Flow(current.flow_name)[current.run_id]
run.add_tag('deployment_candidate')
2Write a Flow
In this flow, we modify our end
step to apply the tag deployment_candidate
if our model passes two tests: (1) a baseline (2) and a smoke test.
Concretely, we will add the following to the end
step:
- A smoke test that tests that the model is performing correctly against very easy examples that it should not be getting wrong. A smoke test is a lightweight way to catch unexpected behaviors in your model, even if your model is beating the baseline.
- A comparison of the model with the baseline. We are going to check if our model's AUC score is better than the baseline. There are more advanced variations on this technique, including using other models for baselines, or requiring that your model performs better than the baseline by a specific margin. We leave these variations as an exercise for the reader.
- Add a tag if our model passes the smoke test and beats the baseline.
from metaflow import FlowSpec, step, Flow, current
class NLPFlow(FlowSpec):
@step
def start(self):
"Read the data"
import pandas as pd
self.df = pd.read_parquet('train.parquet')
self.valdf = pd.read_parquet('valid.parquet')
print(f'num of rows: {self.df.shape[0]}')
self.next(self.baseline, self.train)
@step
def baseline(self):
"Compute the baseline"
from sklearn.metrics import accuracy_score, roc_auc_score
baseline_predictions = [1] * self.valdf.shape[0]
self.base_acc = accuracy_score(
self.valdf.labels, baseline_predictions)
self.base_rocauc = roc_auc_score(
self.valdf.labels, baseline_predictions)
self.next(self.join)
@step
def train(self):
"Train the model"
from model import NbowModel
model = NbowModel(vocab_sz=750)
model.fit(X=self.df['review'], y=self.df['labels'])
self.model_dict = model.model_dict #save model
self.next(self.join)
@step
def join(self, inputs):
"Compare the model results with the baseline."
import pandas as pd
from model import NbowModel
self.model_dict = inputs.train.model_dict
self.train_df = inputs.train.df
self.val_df = inputs.baseline.valdf
self.base_rocauc = inputs.baseline.base_rocauc
self.base_acc = inputs.baseline.base_acc
model = NbowModel.from_dict(self.model_dict)
self.model_acc = model.eval_acc(
X=self.val_df['review'], labels=self.val_df['labels'])
self.model_rocauc = model.eval_rocauc(
X=self.val_df['review'], labels=self.val_df['labels'])
print(f'Baseline Acccuracy: {self.base_acc:.2%}')
print(f'Baseline AUC: {self.base_rocauc:.2}')
print(f'Model Acccuracy: {self.model_acc:.2%}')
print(f'Model AUC: {self.model_rocauc:.2}')
self.next(self.end)
@step
def end(self):
"""Tags model as a deployment candidate
if it beats the baseline and passes smoke tests."""
from model import NbowModel
model = NbowModel.from_dict(self.model_dict)
self.beats_baseline = self.model_rocauc > self.base_rocauc
print(f'Model beats baseline (T/F): {self.beats_baseline}')
#smoke test to make sure model does the right thing.
_tst_reviews = [
"poor fit its baggy in places where it isn't supposed to be.",
"love it, very high quality and great value"
]
_tst_preds = model.predict(_tst_reviews)
check_1 = _tst_preds[0][0] < .5
check_2 = _tst_preds[1][0] > .5
self.passed_smoke_test = check_1 and check_2
msg = 'Model passed smoke test (T/F): {}'
print(msg.format(self.passed_smoke_test))
if self.beats_baseline and self.passed_smoke_test:
print("\n\nThis flow is ready for deployment! \U0001f6a2\U0001f6a2\U0001f6a2 \n\n")
run = Flow(current.flow_name)[current.run_id]
run.add_tag('deployment_candidate')
else:
print("\n\nThis flow failed some tests.\n\n")
if __name__ == '__main__':
NLPFlow()
3Run the Flow
python nlpflow.py run
Now that we have tagged our model, based on standards, we can confidently use it in downstream workflows. In the next lesson, we will explore different ways you can utilize the model you have trained.