Algorithm overview

This page provides a high-level overview of the algorithms and their features currently implemented in Alibi.

Model Explanations

These algorithms provide instance-specific (sometimes also called “local”) explanations of ML model predictions. Given a single instance and a model prediction they aim to answer the question “Why did my model make this prediction?” The following table summarizes the capabilities of the current algorithms:

Explainer

Classification

Regression

Categorical features

Tabular

Text

Images

Needs training set

Anchors

For Tabular

CEM

Optional

Counterfactual Instances

No

Prototype Counterfactuals

Optional

Anchor explanations: produce an “anchor” - a small subset of features and their ranges that will almost always result in the same model prediction. Documentation, tabular example, text classification, image classification.

Contrastive explanation method (CEM): produce a pertinent positive (PP) and a pertinent negative (PN) instance. The PP instance finds the features that should me minimally and sufficiently present to predict the same class as the original prediction (a PP acts as the “most compact” representation of the instance to keep the same prediction). The PN instance identifies the features that should be minimally and necessarily absent to maintain the original prediction (a PN acts as the closest instance that would result in a different prediction). Documentation, tabular example, image classification.

Counterfactual instances: generate counterfactual examples using a simple loss function. Documentation, image classification.

Prototype Counterfactuals: generate counterfactuals guided by nearest class prototypes other than the class predicted on the original instance. It can use both an encoder or k-d trees to define the prototypes. This method can speed up the search, especially for black box models, and create interpretable counterfactuals. Documentation, tabular example, image classification.

Model Confidence

These algorihtms provide instance-specific scores measuring the model confidence for making a particular prediction.

Algorithm

Classification

Regression

Categorical features

Tabular

Text

Images

Needs training set

Trust Scores

1

2

Yes

Linearity Measure

Optional

Trust scores: produce a “trust score” of a classifier’s prediction. The trust score is the ratio between the distance to the nearest class different from the predicted class and the distance to the predicted class, higher scores correspond to more trustworthy predictions. Documentation, tabular example, image classification

Linearity measure: produces a score quantifying how linear the model is around a test instance. The linearity score measures the model linearity around a test instance by feeding the model linear superpositions of inputs and comparing the outputs with the linear combination of outputs from predictions on single inputs. Tabular example, image classification

1

Depending on model

2

May require dimensionality reduction