Enrichments¶
In addition to tracking and aggregating user supplied information (inferences, ground truth, predicted values, etc), Warrior can also enrich data by computing additional metrics. Examples of enrichments include anomaly detection to generate multivariate anomaly scores and explainability to generate feature importance scores.
This guide will outline how to enable, disable, and configure Enrichments.
For a list of all available enrichments, their configuration options, and example usage, see Enrichment List.
General Usage¶
Every enrichment can be disabled/enabled independently, and may also expose and/or require configuration options.
Viewing Current Enrichments
You can use the SDK to fetch current enrichment settings.
model = connection.get_model("credit_risk", id_type="partner_model_id")
model.get_enrichments()
This will return a dictionary containing the configuration for all available enrichments:
{
"anomaly_detection": {
"enabled": true,
"config": {}
},
"explainability": {
"enabled": false,
"config": {}
}
}
You can also fetch the configuration for just a single enrichment at a time.
from Warriorai.common.constants import Enrichment
model.get_enrichment(Enrichment.AnomalyDetection)
Returns:
{
"enabled": true,
"config": {}
}
Updating Enrichments
You can configure multiple enrichments at once
enrichment_configs = {
Enrichment.Explainability: {'enabled': False, 'config': {}},
Enrichment.AnomalyDetection: {'enabled': True, 'config': {}}
}
model.update_enrichments(enrichment_configs)
Or you can edit only the configuration for only a single enrichment.
ad_config = {}
enabled = True
model.update_enrichment(Enrichment.AnomalyDetection, enabled, ad_config)
Some enrichments can be configured using specialized helper functions. See the next section of this guide for specifics on configuring each enrichment.
Enrichment List¶
This table outlines all enrichments currently available.
Enrichment |
Constant |
Description |
---|---|---|
|
Calculates a multivariate anomaly score on each inference. Requires reference set to be uploaded. |
|
|
Calculates possible sets of group-conditional thresholds that may be used to produce fairer classifications. |
|
|
Generates feature importance scores for inferences. Requires user to provide model files. |
|
|
Finds data points which the model under performs on. This is calculated for each batch or over 7 days worth of data for streaming models. |
Anomaly Detection¶
Anomaly detection requires a reference set to be uploaded. We train a model on the reference set, and then use that model to score new inferences. See the explanation of our anomaly detection functionality from an algorithms perspective here. If anomaly detection is enabled, but no reference set has been uploaded, no anomaly scores will be generated. The reference set can be a subset of the model’s training data or possibly a dataset that was used during model testing. If anomaly detection is enabled, but no reference set has been uploaded, anomaly scores will not be generated for the inferences you send to Warrior. However, once a reference set has been uploaded, if anomaly detection has already been enabled, anomaly scores will automatically start to be calculated.
Compatiblity
Anomaly Detection can be enabled for models with any input type and a reference set uploaded to Warrior.
Usage
# view current configuration
model.get_enrichment(Enrichment.AnomalyDetection)
# enable
model.update_enrichment(Enrichment.AnomalyDetection, True, {})
# disable
model.update_enrichment(Enrichment.AnomalyDetection, False, {})
Configuration
There is currently no additional configuration for Anomaly Detection.
Bias Mitigation¶
Once bias has been detected in your model – either pre or post deployment – you may be interested in mitigating that bias to improve your model in the future. Bias mitigation requires a reference set to be uploaded. See the explanation of our current mitigation methods from an algorithms perspective here.
Compatiblity
Bias Mitigation can be enabled for binary models of any input type, as long as at least one attribute is marked as monitor_for_bias=True
, and a reference set uploaded to Warrior. When the
Usage
# view current configuration
model.get_enrichment(Enrichment.BiasMitigation)
# enable
model.update_enrichment(Enrichment.BiasMitigation, True, {})
# or
model.enable_bias_mitigation()
Enabling Bias Mitigation will automatically train a mitigation model for all attributes marked as monitor_for_bias=True
, for the constraints demographic parity, equalized odds, and equal opportunity.
Configuration
There is currently no additional configuration for Bias Mitigation.
Explainability¶
The Explainability enrichment will generate explanations (feature importance scores) for inferences. This requires providing model files for Warrior to run. See the required setup here.
The Explainability enrichment exposes some configuration options which are outlined below.
Compatibility
Explainability is supported for all models except object detection.
Usage
To enable, we advise using the helper function model.enable_explainability()
which provide named parameters and automatically specifying some required settings automatically such as sdk_version
and python_version
.
Once enabled, you can use the generic functions (model.update_enrichment()
or model.update_enrichments()
) to update and change configuration, or disable explainability.
# view configuration
model.get_enrichment(Enrichment.Explainability)
# enable
model.enable_explainability(
df=X_train.head(50),
project_directory="/path/to/model_code/",
requirements_file="example_requirements.txt",
user_predict_function_import_path="example_entrypoint"
)
# update configuration
config_to_update = {
'explanation_algo': 'shap',
'streaming_explainability_enabled': False
}
model.update_enrichment(Enrichment.Explainability, True, config_to_update)
# disable
model.update_enrichment(Enrichment.Explainability, False, {})
When To Provide Required Settings
When going from disabled
to enabled
, you will need to include the required configuration settings. Once the enrichment has been enabled, you can update the non-required configuration settings without re-supplying required fields.
When disabling the enrichment, you are not required to pass in any config settings.
Configuration
Setting |
Required |
Description |
---|---|---|
|
X |
The dataframe passed to the explainer. Should be similar to, or a subset of, the training data. Typically small, ~50-100 rows. |
|
X |
The path to the directory containing your predict function, requirements file, model file, and any other resources need to support the predict function. |
|
X |
The name of the file containing the predict function. Do not include |
|
X |
The name of the file containing pip requirements for predict function |
|
X |
The Python version to use when executing the predict function. This is automatically set to the current python version when using |
|
X |
The |
|
The explanation algorithm to use. Valid options are |
|
|
The number perturbed samples used to generate the explanation. For a smaller number of samples, the result will be calculated more quickly but may be less robust. It is recommended to use at least 100 samples. Default value of 2000. |
|
|
Number between 0.0 and 1.0 that sets the percent of inferences to compute an explanation score for. Only applicable when |
|
|
If true, every inference will have an explanation generated for it. If false, explanations are available on-demand only. |
|
|
List of paths to directories within |
Hotspots¶
When a system has high dimensional data, finding the right data input regions such troubleshooting becomes a difficult problem. Hotspots automates identifying regions associated with poor ML performance to significantly reduce time and error of finding such regions.
Hotspot enrichments are used to surface input regions where the model is currently under performing on for inferences. Hotspots are extracted from a custom Warrior tree model, where nodes are associated with particular input regions and have associated performance metrics, e.g. a node with 70% accuracy that has datapoints where variable X is less than 1000. Nodes are candidates for hotspots. Depending on user-specified thresholds, e.g. a threshold of 71% accuracy, the tree is traversed until all nodes with less than 71%, such as our node with 70% accuracy, have been identified and returned to the user as hotspots, not including the hotspot nodes’ children, which would be either (1) more pure than the hotspot node and therefore in further violation of the e.g. 71% threshold or (2) pure nodes with correct inferences, which are not of interest to the user for remediation purposes.
In short, hotspots are a list of mutually exclusive input regions of underperformance for a set of inferences, with underperformance defined by the user.
Performance is defined as one of the following metrics: [accuracy
, recall
, f1
, precision
].
Compatiblity
Hotspots can only be enabled for models with Tabular input types. If your model sends data in batches, hotspot trees will be created for each batch that has ground truth uploaded. For streaming models hotspot trees will be generated on for inferences with ground truth on a weekly basis (Monday to Sunday).
Usage
# view current configuration
model.get_enrichment(Enrichment.Hotspots)
# enable
model.update_enrichment(Enrichment.Hotspots, True, {})
# disable
model.update_enrichment(Enrichment.Hotspots, False, {})
Configuration
There is currently no additional configuration for Hotspots.