This page was generated from doc/source/methods/Anchors.ipynb.




The anchor algorithm is based on the Anchors: High-Precision Model-Agnostic Explanations paper by Ribeiro et al. and builds on the open source code from the paper’s first author.

The algorithm provides model-agnostic (black box) and human interpretable explanations suitable for classification models applied to images, text and tabular data. The idea behind anchors is to explain the behaviour of complex models with high-precision rules called anchors. These anchors are locally sufficient conditions to ensure a certain prediction with a high degree of confidence.

Anchors address a key shortcoming of local explanation methods like LIME which proxy the local behaviour of the model in a linear way. It is however unclear to what extent the explanation holds up in the region around the instance to be explained, since both the model and data can exhibit non-linear behaviour in the neighborhood of the instance. This approach can easily lead to overconfidence in the explanation and misleading conclusions on unseen but similar instances. The anchor algorithm tackles this issue by incorporating coverage, the region where the explanation applies, into the optimization problem. A simple example from sentiment classification illustrates this (Figure 1). Dependent on the sentence, the occurrence of the word not is interpreted as positive or negative for the sentiment by LIME. It is clear that the explanation using not is very local. Anchors however aim to maximize the coverage, and require not to occur together with good or bad to ensure respectively negative or positive sentiment.


Ribeiro et al., Anchors: High-Precision Model-Agnostic Explanations, 2018

As highlighted by the above example, an anchor explanation consists of if-then rules, called the anchors, which sufficiently guarantee the explanation locally and try to maximize the area for which the explanation holds. This means that as long as the anchor holds, the prediction should remain the same regardless of the values of the features not present in the anchor. Going back to the sentiment example: as long as not good is present, the sentiment is negative, regardless of the other words in the movie review.


For text classification, an interpretable anchor consists of the words that need to be present to ensure a prediction, regardless of the other words in the input. The words that are not present in a candidate anchor can be sampled in 2 ways:

  • Replace word token by UNK token.

  • Replace word token by sampled token from a corpus with the same POS tag and probability proportional to the similarity in the embedding space. By sampling similar words, we keep more context than simply using the UNK token.

Tabular Data

Anchors are also suitable for tabular data with both categorical and continuous features. The continuous features are discretized into quantiles (e.g. deciles), so they become more interpretable. The features in a candidate anchor are kept constant (same category or bin for discretized features) while we sample the other features from a training set. As a result, anchors for tabular data need access to training data. Let’s illustrate this with an example. Say we want to predict whether a person makes less or more than £50,000 per year based on the person’s characteristics including age (continuous variable) and marital status (categorical variable). The following would then be a potential anchor: Hugo makes more than £50,000 because he is married and his age is between 35 and 45 years.


Similar to LIME, images are first segmented into superpixels, maintaining local image structure. The interpretable representation then consists of the presence or absence of each superpixel in the anchor. It is crucial to generate meaningful superpixels in order to arrive at interpretable explanations. The algorithm supports a number of standard image segmentation algorithms (felzenszwalb, slic and quickshift) and allows the user to provide a custom segmentation function.

The superpixels not present in a candidate anchor can be masked in 2 ways:

  • Take the average value of that superpixel.

  • Use the pixel values of a superimposed picture over the masked superpixels.


Ribeiro et al., Anchors: High-Precision Model-Agnostic Explanations, 2018

Efficiently Computing Anchors

The anchor needs to return the same prediction as the original instance with a minimal confidence of e.g. 95%. If multiple candidate anchors satisfy this constraint, we go with the anchor that has the largest coverage. Because the number of potential anchors is exponential in the feature space, we need a faster approximate solution.

The anchors are constructed bottom-up in combination with beam search. We start with an empty rule or anchor, and incrementally add an if-then rule in each iteration until the minimal confidence constraint is satisfied. If multiple valid anchors are found, the one with the largest coverage is returned.

In order to select the best candidate anchors for the beam width efficiently during each iteration, we formulate the problem as a pure exploration multi-armed bandit problem. This limits the number of model prediction calls which can be a computational bottleneck.

For more details, we refer the reader to the original paper.


While each data type has specific requirements to initialize the explainer and return explanations, the underlying algorithm to construct the anchors is the same.

In order to efficiently generate anchors, the following hyperparameters need to be set to sensible values when calling the explain method:

  • threshold: the previously discussed minimal confidence level. threshold defines the minimum fraction of samples for a candidate anchor that need to lead to the same prediction as the original instance. A higher value gives more confidence in the anchor, but also leads to more computation time. The default value is 0.95.

  • tau: determines when we assume convergence for the multi-armed bandit. A bigger value for tau means faster convergence but also looser anchor conditions. By default equal to 0.15.

  • beam_size: the size of the beam width. A bigger beam width can lead to a better overall anchor at the expense of more computation time.

  • batch_size: the batch size used for sampling. A bigger batch size gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls. The default value is 100.

  • coverage_samples: number of samples used to compute the coverage of the anchor. By default set to 10000.



Since the explainer works on black box models, only access to a predict function is needed. The model below is a simple logistic regression trained on movie reviews with negative or positive sentiment and pre-processed with a CountVectorizer:

predict_fn = lambda x: clf.predict(vectorizer.transform(x))

If we choose to sample similar words from a corpus, we first need to load a spaCy model:

import spacy
from import spacy_model

model = 'en_core_web_md'
nlp = spacy.load(model)

We can now initialize our explainer:

explainer = AnchorText(nlp, predict_fn)


Let’s define the instance we want to explain and verify that the sentiment prediction on the original instance is positive:

text = 'This is a good book .'
class_names = ['negative', 'positive']
pred = class_names[predict_fn([text])[0]]

Now we can explain the instance:

explanation = explainer.explain(text, threshold=0.95, use_similarity_proba=False,
                                use_unk=True, sample_proba=0.5)

We set the confidence threshold at 95%. use_unk equals True means that we replace words outside of the candidate anchor with UNK tokens with a sample probability equal to sample_proba. Instead of using UNK tokens, we can sample from the top_n similar words to the ground truth word in the corpus by setting use_unk to False.

explanation = explainer.explain(text, threshold=0.95, use_unk=False, sample_proba=0.5, top_n=100)

It is also possible to sample words from the corpus proportional to the word similarity with the ground truth word by setting use_similarity_proba to True and use_unk to False. We can put more weight on similar words by decreasing the temperature argument. The following explanation perturbs original tokens with probability equal to sample_proba. The perturbed tokens are then sampled from the top_n most similar tokens in the corpus with sample probability proportional to the word similarity with the original token.

explanation = explainer.explain(text, threshold=0.95, use_similarity_proba=True, use_unk=False,
                                sample_proba=0.5, top_n=20, temperature=0.2)

The explain method returns an Explanation object with the following attributes:

  • anchor: a list of words in the anchor.

  • precision: the fraction of times the sampled instances where the anchor holds yields the same prediction as the original instance. The precision will always be \(\geq\) threshold for a valid anchor.

  • coverage: the fraction of sampled instances the anchor applies to.

The raw attribute is a dictionary which also contains example instances where the anchor holds and the prediction is the same as on the original instance, as well as examples where the anchor holds but the prediction changed to give the user a sense of where the anchor fails. raw also stores information on the anchor, precision and coverage of partial anchors. This allows the user to track the improvement in for instance the precision as more features (words in the case of text) are added to the anchor.

Tabular Data

Initialization and fit

To initialize the explainer, we provide a predict function, a list with the feature names to make the anchors easy to understand as well as an optional mapping from the encoded categorical features to a description of the category. An example for categorical_names would be category_map = {0: list(‘married’, ‘divorced’), 3: list(‘high school diploma’, ‘master’s degree’)}. Each key in category_map refers to the column index in the input for the relevant categorical variable, while the values are lists with the options for each categorical variable. To make it easy, we provide a utility function gen_category_map to generate this map automatically from a Pandas dataframe:

from import gen_category_map
category_map = gen_category_map(df)

Then initialize the explainer:

predict_fn = lambda x: clf.predict(preprocessor.transform(x))
explainer = AnchorTabular(predict_fn, feature_names, categorical_names=category_map)

Tabular data requires a fit step to map the ordinal features into quantiles and therefore needs access to a representative set of the training data. disc_perc is a list with percentiles used for binning:, disc_perc=[25, 50, 75])


Let’s check the prediction of the model on the original instance and explain:

class_names = ['<=50K', '>50K']
pred = class_names[explainer.predict_fn(X)[0]]
explanation = explainer.explain(X, threshold=0.95)

The returned Explanation object contains the same attributes as the text explainer, so you could explain a prediction as follows:

Prediction:  <=50K
Anchor: Marital Status = Never-Married AND Relationship = Own-child
Precision: 1.00
Coverage: 0.13



Besides the predict function, we also need to specify either a built in or custom superpixel segmentation function. The built in methods are felzenszwalb, slic and quickshift. It is important to create sensible superpixels in order to speed up convergence and generate interpretable explanations. Tuning the hyperparameters of the segmentation method is recommended.

explainer = AnchorImage(predict_fn, image_shape, segmentation_fn='slic',
                        segmentation_kwargs={'n_segments': 15, 'compactness': 20, 'sigma': .5},

Example of superpixels generated for the Persian cat picture using the slic method:

persiancat persiancatsegm

The following function would be an example of a custom segmentation function dividing the image into rectangles.

def superpixel(image, size=(4, 7)):
    segments = np.zeros([image.shape[0], image.shape[1]])
    row_idx, col_idx = np.where(segments == 0)
    for i, j in zip(row_idx, col_idx):
        segments[i, j] = int((image.shape[1]/size[1]) * (i//size[0]) + j//size[1])
    return segments

The images_background parameter allows the user to provide images used to superimpose on the masked superpixels, not present in the candidate anchor, instead of taking the average value of the masked superpixel. The superimposed images need to have the same shape as the explained instance.


We can then explain the instance in the usual way:

explanation = explainer.explain(image, p_sample=.5)

p_sample determines the fraction of superpixels that are either changed to the average superpixel value or that are superimposed.

The Explanation object again contains information about the anchor’s precision, coverage and examples where the anchor does or does not hold. On top of that, it also contains a masked image with only the anchor superpixels visible under the anchor attribute (see image below) as well as the image’s superpixels under segments.