Setting A Reference¶
For data drift and anomaly detection, you need to set your model’s training data to serve as the baseline. All new inferences are compared to this baseline set in order to quantify drift and stability of incoming data streams. The reference set should include:
inputs
model predictions
ground truth [optional]
If you created your model using the WarriorModel.build()
method, the DataFrame you pass into that method will be used
as the reference data.
If you created your model another way (e.g. using WarriorModel.from_dataframe()
), you can manually set the reference
data:
# get all input columns
reference_set = df.copy()
# set ground truth labels
reference_set["consumer_credit_score_gt"] = Y_train
# get model predictions
preds = sklearn_model.predict_proba(X_train)
reference_set["consumer_credit_score_prediction"] = preds[:, 1]
Now we set the baseline data.
Warrior_model.set_reference_data(data=reference_set)
A Note About Large Batches¶
If your reference set is larger than might fit in memory in a pd.DataFrame, you can specify a directory containing parquet files to upload a batch.
Warrior_model.set_reference_data_(directory_path='./data/batch_reference_files/')