# Algorithm overview

This page provides a high-level overview of the algorithms and their features currently implemented in Alibi.

## Model Explanations

These algorithms provide **instance-specific** (sometimes also called **local**) explanations of ML model
predictions. Given a single instance and a model prediction they aim to answer the question “Why did
my model make this prediction?” Most of the following algorithms work with **black-box** models meaning that the
only requirement is to have access to a prediction function (which could be an API endpoint for a model in production). For an extended discussion see White-box and black-box models.

The following table summarizes the capabilities of the current algorithms:

Method |
Models |
Exp. types |
Classification |
Regression |
Tabular |
Text |
Image |
Cat. data |
Train |
Dist. |
---|---|---|---|---|---|---|---|---|---|---|

BB |
global |
✔ |
✔ |
✔ |
✔ |
|||||

BB |
local |
✔ |
✔ |
✔ |
✔ |
✔ |
For Tabular |
|||

BB* TF/Keras |
local |
✔ |
✔ |
✔ |
Optional |
|||||

BB* TF/Keras |
local |
✔ |
✔ |
✔ |
No |
|||||

BB* TF/Keras |
local |
✔ |
✔ |
✔ |
✔ |
Optional |
||||

BB |
local |
✔ |
✔ |
✔ |
✔ |
✔ |
||||

TF/Keras |
local |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
Optional |
||

BB |
local global |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
|||

WB |
local global |
✔ |
✔ |
✔ |
✔ |
Optional |
||||

WB |
local |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
||

BB WB |
global |
✔ |
✔ |
✔ |
✔ |
✔ |

Key:

**BB**- black-box (only require a prediction function)**BB***- black-box but assume model is differentiable**WB**- requires white-box model access. There may be limitations on models supported**TF/Keras**- TensorFlow models via the Keras API**Local**- instance specific explanation, why was this prediction made?**Global**- explains the model with respect to a set of instances**Cat. data**- support for categorical features**Train**- whether a training set is required to fit the explainer**Dist.**- whether a batch of explanations can be executed in parallel

**Accumulated Local Effects (ALE)**: calculates first-order feature effects on the model with
respect to a dataset. Intended for use on tabular datasets, currently supports numerical features.
Documentation, regression example,
classification example.

**Anchor Explanations**: produce an “anchor” - a small subset of features and their ranges that will
almost always result in the same model prediction. Documentation,
tabular example,
text classification,
image classification.

**Contrastive Explanation Method (CEM)**: produce a pertinent positive (PP) and a pertinent negative
(PN) instance. The PP instance finds the features that should be minimally and sufficiently present
to predict the same class as the original prediction (a PP acts as the “most compact” representation
of the instance to keep the same prediction). The PN instance identifies the features that should be
minimally and necessarily absent to maintain the original prediction (a PN acts as the closest
instance that would result in a different prediction). Documentation,
tabular example, image classification.

**Counterfactual Explanations**: generate counterfactual examples using a simple loss function.
Documentation, image classification.

**Counterfactual Explanations Guided by Prototypes**: generate counterfactuals guided by nearest class
prototypes other than the class predicted on the original instance. It can use both an encoder or k-d trees
to define the prototypes. This method can speed up the search, especially for black box models, and create
interpretable counterfactuals. Documentation,
tabular example,
tabular example with categorical features,
image classification.

**Model-agnostic Counterfactual Explanations with RL**: transform the optimization procedure into an end-to-end
learnable process, allowing to generate batches of counterfactual instances in a single forward pass. The method
is model-agnostic (does not assume differentiability) and relies only on feedback from model predictions, allows for
generating target-conditional counterfactual instances, flexible feature range constraints for numerical and categorical
attributes, including the immutability of protected features (e.g. gender, race) and can be easily extended to other
data modalities such as images. Documentation,
tabular_example,
image_classification.

**Integrated gradients**: attribute an importance score to each element of the input or an internal layer of the the model

with respect to a given baseline. The attributions are calculated as the path integral of the model gradients along a
straight line from the baseline to the input.
Documentation,
MNIST example,
Imagenet example,
IMDB example,
Transformers example.

**Kernel Shapley Additive Explanations (Kernel SHAP)**: attribute the change of a model output with respect
to a given baseline (e.g., average over a reference set) to each of the input features. This is achieved for
each feature in turn, by averaging the difference in the model output observed when the feature whose contribution
is to be estimated is part of a group of “present” input features and the value observed when the feature is excluded
from said group. The features that are not “present” (i.e., are missing) are replaced with values from a background
dataset. This algorithm can be used to explain regression models and it is optimised to distribute batches of explanations.Documentation,
continuous data,
more continuous data,
categorical data,
distributed_batch_explanations.

**Tree Shapley Additive Explanations (Tree SHAP)**: attribute the change of a model output with respect to a baseline
(e.g., average over a reference set or inferred from node data) to each of the input features. Similar to Kernel SHAP,
the shap value of each feature is computed by averaging the difference of the model output observed when the feature
is part of a group of “present” features and when the feature is excluded from said group, over all possible subsets
of “present” features. Different estimation procedures for the effect of selecting different subsets of “present”
features on the model output give rise to the interventional feature perturbation and the path-dependent feature
perturbation variants of Tree SHAP. This algorithm can be used to explain regression models.
Documentation,
interventional feature perturbation Tree SHAP,
path-dependent feature perturbation Tree SHAP.

**Similarity explanations**: present instances in the training set that are similar to the instance of interest
according to a kernel metric. The implemented kernels are gradient-based, meaning that the similarity between 2
instances is based on the gradients of the loss function with respect to the model’s parameters calculated at each
of the instances.
Documentation,
MNIST example,
Imagenet example,
20 news groups example.

**Partial Dependence**: computes the marginal effect that one or multiple features have on the predicted outcome of a
model with respect to a dataset. Intended for use on tabular datasets, black-box and white-box (scikit-learn) models,
supporting numerical and categorical features.
Documentation,
Bike rental.

## Model Confidence

These algorithms provide **instance-specific** scores measuring the model confidence for making a
particular prediction.

Method |
Models |
Classification |
Regression |
Tabular |
Text |
Images |
Categorical Features |
Train set required |
---|---|---|---|---|---|---|---|---|

BB |
✔ |
✔ |
✔1 |
✔2 |
Yes |
|||

BB |
✔ |
✔ |
✔ |
✔ |
Optional |

**Trust scores**: produce a “trust score” of a classifier’s prediction. The trust score is the ratio
between the distance to the nearest class different from the predicted class and the distance to the
predicted class, higher scores correspond to more trustworthy predictions.
Documentation,
tabular example,
image classification.

**Linearity measure**: produces a score quantifying how linear the model is around a test instance.
The linearity score measures the model linearity around a test instance by feeding the model linear
superpositions of inputs and comparing the outputs with the linear combination of outputs from
predictions on single inputs.
Documentation,
Tabular example,
image classification.

## Prototypes

These algorithms provide a **distilled** view of the dataset and help construct a 1-KNN **interpretable** classifier.

Method |
Classification |
Regression |
Tabular |
Text |
Images |
Categorical Features |
Train set labels |
---|---|---|---|---|---|---|---|

✔ |
✔ |
✔ |
✔ |
✔ |
Optional |

**ProtoSelect**: produces a condensed view of the training dataset and facilitates the construction of an interpretable
classification model through 1-KNN. Every class *k* of the training dataset is summarised by a prototype set
constructed to encourage the following three properties: i) covers as many training points as possible of the
class *k*; ii) covers as few training points as possible of classes different from *k*; iii) is sparse - contains as
few prototypes as possible. The method can be applied to any data modality as long as there is a meaningful way of
defining a “distance” between data points which can often be done using a domain-specific pre-processing function.
Documentation,
Tabular and image example.