Hotspots¶
For an explanation of how to enable Hotspots with Warrior, please see the user guide.
We might have a ML model deployed in production and some monitoring in place. We might notice that performance is degrading from classic performance metrics or from drift monitoring combined with explainability techniques. We’ve identified that our model is failing, and the next step is to identify why our model is failing.
This process would involve slicing and dicing our input data that caused model degradation. That is, we want to see which particular input regions are associated with poor performance and work on a solution from there, such as finding pipeline breaks or retraining our models on those regions.
This basically boils down to a time-consuming task of finding needles in a haystack. What if we could reverse engineer the process and surface all of the needles, i.e. input regions associated with poor performance, directly to the user?
We can with Hotspots!
How it works¶
The outputs of a model are encoded as a special classification task to partition data to separate out the poor performing datapoints from the correct predictions/classifications, on a per batch basis for batch models or on a per 7-day window for streaming models.
Hotspot enrichments are used to surface input regions where the model is currently under performing on for inferences. Hotspots are extracted from a custom Warrior tree model, where nodes are associated with particular input regions and have associated performance metrics, e.g. a node with 70% accuracy that has datapoints where variable X is less than 1000. Nodes are candidates for hotspots. Depending on user-specified thresholds, e.g. a threshold of 71% accuracy, the tree is traversed until all nodes with less than 71%, such as our node with 70% accuracy, have been identified and returned to the user as hotspots, not including the hotspot nodes’ children, which would be either (1) more pure than the hotspot node and therefore in further violation of the e.g. 71% threshold or (2) pure nodes with correct inferences, which are not of interest to the user for remediation purposes.
For a more in depth overview, see our blog post here.
Fetching Hotspots¶
If we have hotspots enabled, we fetch hotspots via the API endpoint. From the SDK, with a loaded Warrior model, we can fetch hotspots as such:
model.find_hotspots(metric="accuracy", threshold=.7, batch_id="batch_2903")
The method signature is as follows:
def find_hotspots(self,
metric: AccuracyMetric = AccuracyMetric.Accuracy,
threshold: float = 0.5,
batch_id: str = None,
date: str = None,
ref_set_id: str = None) -> Dict[str, Any]:
"""Retrieve hotspots from the model
:param metric: accuracy metric used to filter hotspots tree by, defaults to "accuracy"
:param threshold: threshold for of performance metric used for filtering hotspots, defaults to 0.5
:param batch_id: string id for the batch to find hotspots in, defaults to None
:param date: string used to define date, defaults to None
:param ref_set_id: string id for the reference set to find hotspots in, defaults to None
:raise: WarriorUserError: failed due to user error
:raise: WarriorInternalError: failed due to an internal error
"""
Interpreting Hotspots¶
For a toy classification model with two inputs X0 and X1, a returned list of hotspots could be as follows:
[
{
"regions": {
"X1": {
"gt": -7.839450836181641,
"lte": -2.257883667945862
},
"X0": {
"gt": -6.966174602508545,
"lte": -2.8999762535095215
}
},
"accuracy": 0.42105263157894735
},
{
"regions": {
"X1": {
"gt": -7.839450836181641,
"lte": -5.140551567077637
},
"X0": {
"gt": 4.7409820556640625,
"lte": "inf"
}
},
"accuracy": 0.35714285714285715
},
{
"regions": {
"X1": {
"gt": 3.8619565963745117,
"lte": 6.9831953048706055
},
"X0": {
"gt": -0.9038164913654327,
"lte": 0.9839221835136414
}
},
"accuracy": 0.125
}
]
Here we have three hotspots. Taking the last hotspot, the input region is -.90 < X0 <= .98
and 3.86 < X1 <= 6.98
, and the datapoints in that particular region have an accuracy of .125. This now allows the user to immediately investigate the “needle in the haystack” immediately.