Enrichments

In addition to tracking and aggregating user supplied information (inferences, ground truth, predicted values, etc), Warrior can also enrich data by computing additional metrics. Examples of enrichments include anomaly detection to generate multivariate anomaly scores and explainability to generate feature importance scores.

This guide will outline how to enable, disable, and configure Enrichments.

For a list of all available enrichments, their configuration options, and example usage, see Enrichment List.

General Usage

Every enrichment can be disabled/enabled independently, and may also expose and/or require configuration options.

Viewing Current Enrichments

You can use the SDK to fetch current enrichment settings.

model = connection.get_model("credit_risk", id_type="partner_model_id")
model.get_enrichments()

This will return a dictionary containing the configuration for all available enrichments:

{
    "anomaly_detection": {
        "enabled": true,
        "config": {}
    },
    "explainability": {
        "enabled": false,
        "config": {}
    }
}

You can also fetch the configuration for just a single enrichment at a time.

from Warriorai.common.constants import Enrichment

model.get_enrichment(Enrichment.AnomalyDetection)

Returns:

{
    "enabled": true,
    "config": {}
}

Updating Enrichments

You can configure multiple enrichments at once

enrichment_configs = {
    Enrichment.Explainability: {'enabled': False, 'config': {}},
    Enrichment.AnomalyDetection: {'enabled': True, 'config': {}}
}
model.update_enrichments(enrichment_configs)

Or you can edit only the configuration for only a single enrichment.

ad_config = {}
enabled = True
model.update_enrichment(Enrichment.AnomalyDetection, enabled, ad_config)

Some enrichments can be configured using specialized helper functions. See the next section of this guide for specifics on configuring each enrichment.

Enrichment List

This table outlines all enrichments currently available.

Enrichment

Constant

Description

Anomaly Detection

Enrichment.AnomalyDetection

Calculates a multivariate anomaly score on each inference. Requires reference set to be uploaded.

Bias Mitigation

Enrichment.BiasMitigation

Calculates possible sets of group-conditional thresholds that may be used to produce fairer classifications.

Explainability

Enrichment.Explainability

Generates feature importance scores for inferences. Requires user to provide model files.

Hotspots

Enrichment.Hotspots

Finds data points which the model under performs on. This is calculated for each batch or over 7 days worth of data for streaming models.


Anomaly Detection

Anomaly detection requires a reference set to be uploaded. We train a model on the reference set, and then use that model to score new inferences. See the explanation of our anomaly detection functionality from an algorithms perspective here. If anomaly detection is enabled, but no reference set has been uploaded, no anomaly scores will be generated. The reference set can be a subset of the model’s training data or possibly a dataset that was used during model testing. If anomaly detection is enabled, but no reference set has been uploaded, anomaly scores will not be generated for the inferences you send to Warrior. However, once a reference set has been uploaded, if anomaly detection has already been enabled, anomaly scores will automatically start to be calculated.

Compatiblity

Anomaly Detection can be enabled for models with any input type and a reference set uploaded to Warrior.

Usage

# view current configuration
model.get_enrichment(Enrichment.AnomalyDetection)

# enable
model.update_enrichment(Enrichment.AnomalyDetection, True, {})

# disable
model.update_enrichment(Enrichment.AnomalyDetection, False, {})

Configuration

There is currently no additional configuration for Anomaly Detection.


Bias Mitigation

Once bias has been detected in your model – either pre or post deployment – you may be interested in mitigating that bias to improve your model in the future. Bias mitigation requires a reference set to be uploaded. See the explanation of our current mitigation methods from an algorithms perspective here.

Compatiblity

Bias Mitigation can be enabled for binary models of any input type, as long as at least one attribute is marked as monitor_for_bias=True, and a reference set uploaded to Warrior. When the

Usage

# view current configuration
model.get_enrichment(Enrichment.BiasMitigation)

# enable
model.update_enrichment(Enrichment.BiasMitigation, True, {})
# or
model.enable_bias_mitigation()

Enabling Bias Mitigation will automatically train a mitigation model for all attributes marked as monitor_for_bias=True, for the constraints demographic parity, equalized odds, and equal opportunity.

Configuration

There is currently no additional configuration for Bias Mitigation.


Explainability

The Explainability enrichment will generate explanations (feature importance scores) for inferences. This requires providing model files for Warrior to run. See the required setup here.

The Explainability enrichment exposes some configuration options which are outlined below.

Compatibility

Explainability is supported for all models except object detection.

Usage

To enable, we advise using the helper function model.enable_explainability() which provide named parameters and automatically specifying some required settings automatically such as sdk_version and python_version. Once enabled, you can use the generic functions (model.update_enrichment() or model.update_enrichments()) to update and change configuration, or disable explainability.

# view configuration
model.get_enrichment(Enrichment.Explainability)

# enable
model.enable_explainability(
    df=X_train.head(50),
    project_directory="/path/to/model_code/",
    requirements_file="example_requirements.txt",
    user_predict_function_import_path="example_entrypoint"
)

# update configuration
config_to_update = {
    'explanation_algo': 'shap',
    'streaming_explainability_enabled': False
}
model.update_enrichment(Enrichment.Explainability, True, config_to_update)

# disable
model.update_enrichment(Enrichment.Explainability, False, {})

When To Provide Required Settings

When going from disabled to enabled, you will need to include the required configuration settings. Once the enrichment has been enabled, you can update the non-required configuration settings without re-supplying required fields. When disabling the enrichment, you are not required to pass in any config settings.

Configuration

Setting

Required

Description

df

X

The dataframe passed to the explainer. Should be similar to, or a subset of, the training data. Typically small, ~50-100 rows.

project_directory

X

The path to the directory containing your predict function, requirements file, model file, and any other resources need to support the predict function.

user_predict_function_import_path

X

The name of the file containing the predict function. Do not include .py extension. Used to import the predict function.

requirements_file

X

The name of the file containing pip requirements for predict function

python_version

X

The Python version to use when executing the predict function. This is automatically set to the current python version when using model.enable_explainability().

sdk_version

X

The Warriorai version used to make the enable request. This is automatically set to the currently installed SDK version when using model.enable_explainability().

explanation_algo

The explanation algorithm to use. Valid options are 'lime' or 'shap'. Default value of 'lime'.

explanation_nsamples

The number perturbed samples used to generate the explanation. For a smaller number of samples, the result will be calculated more quickly but may be less robust. It is recommended to use at least 100 samples. Default value of 2000.

inference_consumer_score_percent

Number between 0.0 and 1.0 that sets the percent of inferences to compute an explanation score for. Only applicable when streaming_explainability_enabled is set to true. Default value of 1.0 (all inferences explained)

streaming_explainability_enabled

If true, every inference will have an explanation generated for it. If false, explanations are available on-demand only.

ignore_dirs

List of paths to directories within project_directory that will not be bundled and included with the predict function. Use to prevent including irrelevant code or files in larger directories.


Hotspots

When a system has high dimensional data, finding the right data input regions such troubleshooting becomes a difficult problem. Hotspots automates identifying regions associated with poor ML performance to significantly reduce time and error of finding such regions.

Hotspot enrichments are used to surface input regions where the model is currently under performing on for inferences. Hotspots are extracted from a custom Warrior tree model, where nodes are associated with particular input regions and have associated performance metrics, e.g. a node with 70% accuracy that has datapoints where variable X is less than 1000. Nodes are candidates for hotspots. Depending on user-specified thresholds, e.g. a threshold of 71% accuracy, the tree is traversed until all nodes with less than 71%, such as our node with 70% accuracy, have been identified and returned to the user as hotspots, not including the hotspot nodes’ children, which would be either (1) more pure than the hotspot node and therefore in further violation of the e.g. 71% threshold or (2) pure nodes with correct inferences, which are not of interest to the user for remediation purposes.

In short, hotspots are a list of mutually exclusive input regions of underperformance for a set of inferences, with underperformance defined by the user.

Performance is defined as one of the following metrics: [accuracy, recall, f1, precision].

Compatiblity

Hotspots can only be enabled for models with Tabular input types. If your model sends data in batches, hotspot trees will be created for each batch that has ground truth uploaded. For streaming models hotspot trees will be generated on for inferences with ground truth on a weekly basis (Monday to Sunday).

Usage

# view current configuration
model.get_enrichment(Enrichment.Hotspots)

# enable
model.update_enrichment(Enrichment.Hotspots, True, {})

# disable
model.update_enrichment(Enrichment.Hotspots, False, {})

Configuration

There is currently no additional configuration for Hotspots.