# alibi.explainers package¶

The ‘alibi.explainers’ module includes feature importance, counterfactual and anchor-based explainers.

class alibi.explainers.ALE(predictor, feature_names=None, target_names=None)[source]
__init__(predictor, feature_names=None, target_names=None)[source]

Accumulated Local Effects for tabular datasets. Current implementation supports first order feature effects of numerical features.

Parameters
Return type

None

build_explanation(ale_values, ale0, constant_value, feature_values, feature_deciles)[source]

Helper method to build the Explanation object.

Return type

Explanation

explain(X, min_bin_points=4)[source]

Calculate the ALE curves for each feature with respect to the dataset X.

Parameters
• X (ndarray) – An NxF tabular dataset used to calculate the ALE curves. This is typically the training dataset or a representative sample.

• min_bin_points (int) – Minimum number of points each discretized interval should contain to ensure more precise ALE estimation.

Return type

Explanation

Returns

An Explanation object containing the data and the metadata of the calculated ALE curves.

class alibi.explainers.AnchorTabular(predictor, feature_names, categorical_names=None, seed=None)[source]
__init__(predictor, feature_names, categorical_names=None, seed=None)[source]
Parameters
Return type

None

add_names_to_exp(explanation)[source]

Add feature names to explanation dictionary.

Parameters

explanation (dict) – Dict with anchors and additional metadata.

Return type

None

build_explanation(X, result, predicted_label, params)[source]

Preprocess search output and return an explanation object containing metdata

Parameters
• X (ndarray) – Instance to be explained.

• result (dict) – Dictionary with explanation search output and metadata.

• predicted_label (int) – Label of the instance to be explained (inferred if not given).

• params (dict) – Parameters passed to explain

Return type

Explanation

Returns

explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain prediction made by classifier on instance X.

Parameters
• X (ndarray) – Instance to be explained.

• threshold (float) – Minimum precision threshold.

• delta (float) – Used to compute beta.

• tau (float) – Margin between lower confidence bound and minimum precision or upper bound.

• batch_size (int) – Batch size used for sampling.

• coverage_samples (int) – Number of samples used to estimate coverage from during result search.

• beam_size (int) – The number of anchors extended at each step of new anchors construction.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features in result.

• min_samples_start (int) – Min number of initial samples.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).

• binary_cache_size (int) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanation – Dictionary containing the result explaining the instance with additional metadata.

fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Fit discretizer to train data to bin numerical features into ordered bins and compute statistics for numerical features. Create a mapping between the bin numbers of each discretised numerical feature and the row id in the training set where it occurs.

Parameters
Return type

AnchorTabular

class alibi.explainers.DistributedAnchorTabular(predictor, feature_names, categorical_names=None, seed=None)[source]
explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=1, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explains the prediction made by a classifier on instance X. Sampling is done in parallel over a number of cores specified in kwargs[‘ncpu’].

Parameters

superclass implementation. (See) –

Return type

Explanation

Returns

See superclass implementation.

fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Creates a list of handles to parallel processes handles that are used for submitting sampling tasks.

Parameters

superclass implementation. (See) –

Return type

AnchorTabular

class alibi.explainers.AnchorText(nlp, predictor, seed=None)[source]
UNK = 'UNK'
__init__(nlp, predictor, seed=None)[source]

Initialize anchor text explainer.

Parameters
• nlp (spacy.language.Language) – spaCy object.

• predictor (Callable) – A callable that takes a tensor of N data points as inputs and returns N outputs.

• seed (int) – If set, ensures identical random streams.

Return type

None

build_explanation(text, result, predicted_label, params)[source]

Uses the metadata returned by the anchor search algorithm together with the instance to be explained to build an explanation object.

Parameters
Return type

Explanation

compare_labels(samples)[source]

Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of features fixed to a given value (aka compute the precision of anchors).

Parameters

samples (ndarray) – Samples whose labels are to be compared with the instance label.

Return type

ndarray

Returns

A boolean array indicating whether the prediction was the same as the instance label.

explain(text, use_unk=True, use_similarity_proba=False, sample_proba=0.5, top_n=100, temperature=1.0, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=True, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain instance and return anchor with metadata.

Parameters
• text (str) – Text instance to be explained.

• use_unk (bool) – If True, perturbation distribution will replace words randomly with UNKs. If False, words will be replaced by similar words using word embeddings.

• use_similarity_proba (bool) – Sample according to a similarity score with the corpus embeddings use_unk needs to be False in order for this to be used.

• sample_proba (float) – Sample probability if use_similarity_proba is False.

• top_n (int) – Number of similar words to sample for perturbations, only used if use_unk=False.

• temperature (float) – Sample weight hyperparameter if use_similarity_proba equals True.

• threshold (float) – Minimum precision threshold.

• delta (float) – Used to compute beta.

• tau (float) – Margin between lower confidence bound and minimum precision or upper bound.

• batch_size (int) – Batch size used for sampling.

• coverage_samples (int) – Number of samples used to estimate coverage from during anchor search.

• beam_size (int) – Number of options kept after each stage of anchor building.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features to include in an anchor.

• min_samples_start (int) – Number of samples used for anchor search initialisation.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with predicted label are stored).

• binary_cache_size (int) – The anchor search pre-allocates binary_cache_size batches for storing the boolean arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• kwargs (Any) – Other keyword arguments passed to the anchor beam search and the text sampling and perturbation functions.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanation – Dictionary containing the anchor explaining the instance with additional metadata.

find_similar_words()[source]

This function queries a spaCy nlp model to find n similar words with the same part of speech for each word in the instance to be explained. For each word the search procedure returns a dictionary containing an np.array of words (‘words’) and an np.array of word similarities (‘similarities’).

Return type

None

perturb_sentence(present, n, sample_proba=0.5, forbidden=frozenset({}), forbidden_tags=frozenset({'PRP\$'}), forbidden_words=frozenset({'be'}), temperature=1.0, pos=frozenset({'ADJ', 'ADP', 'ADV', 'DET', 'NOUN', 'VERB'}), use_similarity_proba=True)[source]

Perturb the text instance to be explained.

Parameters
Return type

Tuple[ndarray, ndarray]

Returns

• raw_data – Array of perturbed text instances.

• data – Matrix with 1s and 0s indicating whether a word in the text has not been perturbed for each sample.

sampler(anchor, num_samples, compute_labels=True)[source]

Generate perturbed samples while maintaining features in positions specified in anchor unchanged.

Parameters
Return type

Union[List[Union[ndarray, float, int]], List[ndarray]]

Returns

• If compute_labels=True, a list containing the following is returned

• covered_true: perturbed examples where the anchor applies and the model prediction

on perturbation is the same as the instance prediction

• covered_false: perturbed examples where the anchor applies and the model prediction

is NOT the same as the instance prediction

• labels: num_samples ints indicating whether the prediction on the perturbed sample

matches (1) the label of the instance to be explained or not (0)

• data: Matrix with 1s and 0s indicating whether a word in the text has been

perturbed for each sample

• 1.0: indicates exact coverage is not computed for this algorithm

• anchor[0]: position of anchor in the batch request

• Otherwise, a list containing the data matrix only is returned.

set_data_type(use_unk)[source]

Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.

Parameters

use_unk (bool) – See explain method.

Return type

None

set_sampler_perturbation(use_unk, perturb_opts, top_n)[source]

Initialises the explainer by setting the perturbation function and parameters necessary to sample according to the perturbation method.

Parameters
• use_unk (bool) – see explain method

• perturb_opts (dict) –

A dict with keys:

’top_n’: the max number of alternatives to sample from for replacement ‘use_similarity_proba’: if True the probability of selecting a replacement

word is prop. to the similarity between the word and the word to be replaced

’sample_proba’: given a feature and n sentences, this parameters is the mean of a

Bernoulli distribution used to decide how many sentences will have that feature perturbed

’temperature’: a tempature used to callibrate the softmax distribution over the

sampling weights.

• top_n (int) – Number of similar words to sample for perturbations, only used if use_unk=False.

Return type

None

set_words_and_pos(text)[source]

Process the sentence to be explained into spaCy token objects, a list of words, punctuation marks and a list of positions in input sentence.

Parameters

text (str) – The instance to be explained.

Return type

None

class alibi.explainers.AnchorImage(predictor, image_shape, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]
__init__(predictor, image_shape, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]

Initialize anchor image explainer.

Parameters
Return type

None

build_explanation(image, result, predicted_label, params)[source]

Uses the metadata returned by the anchor search algorithm together with the instance to be explained to build an explanation object.

Parameters
• image (ndarray) – Instance to be explained.

• result (dict) – Dictionary containing the search anchor and metadata.

• predicted_label (int) – Label of the instance to be explained.

• params (dict) – Parameters passed to explain

Return type

Explanation

compare_labels(samples)[source]

Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of perturbed superpixels.

Parameters

samples (ndarray) – Samples whose labels are to be compared with the instance label.

Return type

ndarray

Returns

A boolean array indicating whether the prediction was the same as the instance label.

explain(image, p_sample=0.5, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain instance and return anchor with metadata.

Parameters
• image (ndarray) – Image to be explained.

• p_sample (float) – Probability for a pixel to be represented by the average value of its superpixel.

• threshold (float) – Minimum precision threshold.

• delta (float) – Used to compute beta.

• tau (float) – Margin between lower confidence bound and minimum precision of upper bound.

• batch_size (int) – Batch size used for sampling.

• coverage_samples (int) – Number of samples used to estimate coverage from during result search.

• beam_size (int) – The number of anchors extended at each step of new anchors construction.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features in result.

• min_samples_start (int) – Min number of initial samples.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).

• binary_cache_size (int) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanation – Dictionary containing the anchor explaining the instance with additional metadata.

generate_superpixels(image)[source]

Generates superpixels from (i.e., segments) an image.

Parameters

image (ndarray) – A grayscale or RGB image.

Return type

ndarray

Returns

A [H, W] array of integers. Each integer is a segment (superpixel) label.

overlay_mask(image, segments, mask_features, scale=(0, 255))[source]

Parameters
• image (ndarray) – Image to be explained.

• segments (ndarray) – Superpixels

• mask_features (list) – List with superpixels present in mask.

• scale (tuple) – Pixel scale for masked image.

Return type

ndarray

Returns

perturbation(anchor, num_samples)[source]

Perturbs an image by altering the values of selected superpixels. If a dataset of image backgrounds is provided to the explainer, then the superpixels are replaced with the equivalent superpixels from the background image. Otherwise, the superpixels are replaced by their average value.

Parameters
• anchor (tuple) – Contains the superpixels whose values are not going to be perturbed.

• num_samples (int) – Number of perturbed samples to be returned.

Return type

Tuple[ndarray, ndarray]

Returns

• imgs – A [num_samples, H, W, C] array of perturbed images.

• segments_mask – A [num_samples, M] binary mask, where M is the number of image superpixels segments. 1 indicates the values in that particular superpixels are not perturbed.

sampler(anchor, num_samples, compute_labels=True)[source]

Sample images from a perturbation distribution by masking randomly chosen superpixels from the original image and replacing them with pixel values from superimposed images if background images are provided to the explainer. Otherwise, the superpixels from the original image are replaced with their average values.

Parameters
Return type

Union[List[Union[ndarray, float, int]], List[ndarray]]

Returns

• If compute_labels=True, a list containing the following is returned

• covered_true: perturbed examples where the anchor applies and the model prediction

on perturbed is the same as the instance prediction

• covered_false: perturbed examples where the anchor applies and the model prediction

on pertrurbed sample is NOT the same as the instance prediction

• labels: num_samples ints indicating whether the prediction on the perturbed sample

matches (1) the label of the instance to be explained or not (0)

• data: Matrix with 1s and 0s indicating whether the values in a superpixel will

remain unchanged (1) or will be perturbed (0), for each sample

• 1.0: indicates exact coverage is not computed for this algorithm

• anchor[0]: position of anchor in the batch request

• Otherwise, a list containing the data matrix only is returned.

class alibi.explainers.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]
__init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623

Parameters
• predict (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

• mode (str) – Find pertinant negatives (‘PN’) or pertinant positives (‘PP’)

• shape (tuple) – Shape of input data starting with batch size

• kappa (float) – Confidence parameter for the attack loss term

• beta (float) – Regularization constant for L1 loss term

• feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges

• gamma (float) – Regularization constant for optional auto-encoder loss term

• ae_model (Union[tensorflow.keras.Model, Model')]) – Optional auto-encoder model used for loss regularization

• learning_rate_init (float) – Initial learning rate of optimizer

• max_iterations (int) – Maximum number of iterations for finding a PN or PP

• c_init (float) – Initial value to scale the attack loss term

• c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term

• eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)

• clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph

• update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations

• no_info_val (Union[float, ndarray]) – Global or feature-wise value considered as containing no information

• write_dir (str) – Directory to write tensorboard files to

• sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

attack(X, Y, verbose=False)[source]

Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
• X (ndarray) – Instance to attack

• Y (ndarray) – Labels for X

• verbose (bool) – Print intermediate results of optimization if True

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, verbose=False)[source]

Explain instance and return PP or PN with metadata.

Parameters
• X (ndarray) – Instances to attack

• Y (Optional[ndarray]) – Labels for X

• verbose (bool) – Print intermediate results of optimization if True

Return type

Explanation

Returns

fit(train_data, no_info_type='median')[source]

Get ‘no information’ values from the training data.

Parameters
• train_data (ndarray) – Representative sample from the training data

• no_info_type (str) – Median or mean value by feature supported

Return type

CEM

get_gradients(X, Y)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
• X (ndarray) – Instance around which gradient is evaluated

• Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
• pred_proba (ndarray) – Prediction probabilities of an instance

• Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Loss of the attack.

perturb(X, eps, proba=False)[source]

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters
• X (ndarray) – Array to be perturbed

• eps (Union[float, ndarray]) – Size of perturbation

• proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1

Return type

Tuple[ndarray, ndarray]

Returns

Instances where a positive and negative perturbation is applied.

class alibi.explainers.CounterFactual(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(-10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]
__init__(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(-10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]

Initialize counterfactual explanation method based on Wachter et al. (2017)

Parameters
• predict_fn (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

• shape (Tuple[int, …]) – Shape of input data starting with batch size

• distance_fn (str) – Distance function to use in the loss term

• target_proba (float) – Target probability for the counterfactual to reach

• target_class (Union[str, int]) – Target class for the counterfactual to reach, one of ‘other’, ‘same’ or an integer denoting desired class membership for the counterfactual instance

• max_iter (int) – Maximum number of interations to run the gradient descent for (inner loop)

• early_stop (int) – Number of steps after which to terminate gradient descent if all or none of found instances are solutions

• lam_init (float) – Initial regularization constant for the prediction part of the Wachter loss

• max_lam_steps (int) – Maximum number of times to adjust the regularization constant (outer loop) before terminating the search

• tol (float) – Tolerance for the counterfactual target probability

• learning_rate_init – Initial learning rate for each outer loop of lambda

• feature_range (Union[Tuple, str]) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1 x nb of features) for feature-wise ranges

• eps (Union[float, ndarray]) – Gradient step sizes used in calculating numerical gradients, defaults to a single value for all features, but can be passed an array for feature-wise step sizes

• init (str) – Initialization method for the search of counterfactuals, currently must be ‘identity’

• decay (bool) – Flag to decay learning rate to zero for each outer loop over lambda

• write_dir (str) – Directory to write Tensorboard files to

• debug (bool) – Flag to write Tensorboard summaries for debugging

• sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

explain(X)[source]

Explain an instance and return the counterfactual with metadata.

Parameters

X (ndarray) – Instance to be explained

Return type

Explanation

Returns

fit(X, y)[source]

Fit method - currently unused as the counterfactual search is fully unsupervised.

Return type

CounterFactual

class alibi.explainers.CounterFactualProto(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]
__init__(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Initialize prototypical counterfactual method.

Parameters
• predict (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

• shape (tuple) – Shape of input data starting with batch size

• kappa (float) – Confidence parameter for the attack loss term

• beta (float) – Regularization constant for L1 loss term

• feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges

• gamma (float) – Regularization constant for optional auto-encoder loss term

• ae_model (Union[tensorflow.keras.Model, Model')]) – Optional auto-encoder model used for loss regularization

• enc_model (Union[tensorflow.keras.Model, Model')]) – Optional encoder model used to guide instance perturbations towards a class prototype

• theta (float) – Constant for the prototype search loss term

• cat_vars (dict) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

• ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

• use_kdtree (bool) – Whether to use k-d trees for the prototype loss term if no encoder is available

• learning_rate_init (float) – Initial learning rate of optimizer

• max_iterations (int) – Maximum number of iterations for finding a counterfactual

• c_init (float) – Initial value to scale the attack loss term

• c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term

• eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)

• clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph

• update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations

• write_dir (str) – Directory to write tensorboard files to

• sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

attack(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Find a counterfactual (CF) for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
• X (ndarray) – Instance to attack

• Y (ndarray) – Labels for X as one-hot-encoding

• target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

• k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

• k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

• threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

• verbose (bool) – Print intermediate results of optimization if True

• print_every (int) – Print frequency if verbose is True

• log_every (int) – Tensorboard log frequency if write directory is specified

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Explain instance and return counterfactual with metadata.

Parameters
• X (ndarray) – Instances to attack

• Y (Optional[ndarray]) – Labels for X as one-hot-encoding

• target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

• k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

• k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

• threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

• verbose (bool) – Print intermediate results of optimization if True

• print_every (int) – Print frequency if verbose is True

• log_every (int) – Tensorboard log frequency if write directory is specified

Return type

Explanation

Returns

fit(train_data, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=(25, 50, 75), standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)[source]

Get prototypes for each class using the encoder or k-d trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.

Parameters
• train_data (ndarray) – Representative sample from the training data.

• trustscore_kwargs (Optional[dict]) – Optional arguments to initialize the trust scores method.

• d_type (str) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdm-mvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdm-mvdm’ is a weighted combination of the two metrics.

• w (Optional[float]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdm-mvdm’.

• disc_perc (Sequence[Union[int, float]]) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdm-mvdm’ pairwise distance measures.

• standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

• smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.

• center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

• update_feature_range (bool) – Update feature range with scaled values.

Return type

CounterFactualProto

get_gradients(X, Y, grads_shape, cat_vars_ord=None)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
• X (ndarray) – Instance around which gradient is evaluated

• Y (ndarray) – One-hot representation of instance labels

• grads_shape (tuple) – Shape of gradients.

• cat_vars_ord (Optional[dict]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

Return type

ndarray

Returns

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
• pred_proba (ndarray) – Prediction probabilities of an instance

• Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Loss of the attack.

score(X, adv_class, orig_class, eps=1e-10)[source]
Parameters
• X (ndarray) – Instance to encode and calculate distance metrics for

• adv_class (int) – Predicted class on the perturbed instance

• orig_class (int) – Predicted class on the original instance

• eps (float) – Small number to avoid dividing by 0

Return type

float

Returns

• Ratio between the distance to the prototype of the predicted class for the original instance and

• the prototype of the predicted class for the perturbed instance.

class alibi.explainers.KernelShap(predictor, link='identity', feature_names=None, categorical_names=None, task='classification', seed=None)[source]
__init__(predictor, link='identity', feature_names=None, categorical_names=None, task='classification', seed=None)[source]

A wrapper around the shap.KernelExplainer class. It extends the current shap library functionality by allowing the user to specify variable groups in order to treat one-hot encoded categorical as one during sampling. The user can also specify whether to aggregate the shap values estimate for the encoded levels of categorical variables as an optional argument to explain, if grouping arguments are not passed to fit.

Parameters
• predictor (Callable) – A callable that takes as an input a samples x features array and outputs a samples x n_outputs model outputs. The n_outputs should represent model output in margin space. If the model outputs probabilities, then the link should be set to ‘logit’ to ensure correct force plots.

• link (str) –

Valid values are ‘identity’ or ‘logit’. A generalized linear model link to connect the feature importance values to the model output. Since the feature importance values, $$\phi$$, sum up to the model output, it often makes sense to connect them to the ouput with a link function where $$link(output - expected\_value) = sum(\phi)$$. Therefore, for a model which outputs probabilities, link=’logit’ makes the feature effects have log-odds (evidence) units and link=’identity’ means that the feature effects have probability units. Please see this example for an in-depth discussion about the semantics of explaining the model in the probability or margin space.

• feature_names (Union[List[str], Tuple[str], None]) – Used to infer group names when categorical data is treated by grouping and group_names input to fit is not specified, assuming it has the same length as the groups argument of fit method. It is also used to compute the names field, which appears as a key in each of the values of explanation.data[‘raw’][‘importances’].

• categorical_names (Optional[Dict[int, List[str]]]) – Keys are feature column indices in the background_data matrix (see fit). Each value contains strings with the names of the categories for the feature. Used to select the method for background data summarisation (if specified, subsampling is performed as opposed to k-means clustering). In the future it may be used for visualisation.

• task (str) – Can have values ‘classification’ and ‘regression’. It is only used to set the contents of explanation.data[‘raw’][‘prediction’]

• seed (Optional[int]) – Fixes the random number stream, which influences which subsets are sampled during shap value estimation.

build_explanation(X, shap_values, expected_value, **kwargs)[source]

Create an explanation object. If output summarisation is required and all inputs necessary for this operation are passed, the raw shap values are summed first so that a single shap value is returned for each categorical variable, as opposed to a shap value per dimension of categorical variable encoding.

Parameters
• X (Union[ndarray, DataFrame, spmatrix]) – Instances to be explained.

• shap_values (List[ndarray]) – Each entry is a n_instances x n_features array, and the length of the list equals the dimensionality of the predictor output. The rows of each array correspond to the shap values for the instances with the corresponding row index in X. The length of the list equals the number of model outputs.

• expected_value (List[float]) – A list containing the expected value of the prediction for each class. Its length should be equal to that of shap_values.

Return type

Explanation

Returns

explanation – An explanation object containing the shap values and prediction in the data field, along with a meta field containing additional data. See usage examples for details.

explain(X, summarise_result=False, cat_vars_start_idx=None, cat_vars_enc_dim=None, **kwargs)[source]

Explains the instances in the array X.

Parameters
• X (Union[ndarray, DataFrame, spmatrix]) – Instances to be explained.

• summarise_result (bool) – Specifies whether the shap values corresponding to dimensions of encoded categorical variables should be summed so that a single shap value is returned for each categorical variable. Both the start indices of the categorical variables (cat_vars_start_idx) and the encoding dimensions (cat_vars_enc_dim) have to be specified

• cat_vars_start_idx (Optional[Sequence[int]]) – The start indices of the categorical variables. If specified, cat_vars_enc_dim should also be specified.

• cat_vars_enc_dim (Optional[Sequence[int]]) – The length of the encoding dimension for each categorical variable. If specified cat_vars_start_idx should also be specified.

• kwargs

Keyword arguments specifying explain behaviour. Valid arguments are:

• nsamples: controls the number of predictor calls and therefore runtime.

• l1_reg: the algorithm is exponential in the feature dimension. If set to auto the algorithm will first run a feature selection algorithm to select the top features, provided the fraction of sampled sets of missing features is less than 0.2 from the number of total subsets. The Akaike Information Criterion is used in this case. See our examples for more details about available settings for this parameter. Note that by first running a feature selection step, the shapley values of the remainder of the features will be different to those estimated from the entire set.

For more details, please see the shap library documentation .

Return type

Explanation

Returns

explanation – An explanation object containing the algorithm results.

fit(background_data, summarise_background=False, n_background_samples=300, group_names=None, groups=None, weights=None, **kwargs)[source]

This takes a background dataset (usually a subsample of the training set) as an input along with several user specified options and initialises a KernelShap explainer. The runtime of the algorithm depends on the number of samples in this dataset and on the number of features in the dataset. To reduce the size of the dataset, the summarise_background option and n_background_samples should be used. To reduce the feature dimensionality, encoded categorical variables can be treated as one during the feature perturbation process; this decreases the effective feature dimensionality, can reduce the variance of the shap values estimation and reduces slightly the number of calls to the predictor. Further runtime savings can be achieved by changing the nsamples parameter in the call to explain. Runtime reduction comes with an accuracy trade-off, so it is better to experiment with a runtime reduction method and understand results stability before using the system.

Parameters
• background_data (Union[ndarray, spmatrix, DataFrame, Data]) – Data used to estimate feature contributions and baseline values for force plots. The rows of the background data should represent samples and the columns features.

• summarise_background (Union[bool, str]) – A large background dataset impacts the runtime and memory footprint of the algorithm. By setting this argument to True, only n_background_samples from the provided data are selected. If group_names or groups arguments are specified, the algorithm assumes that the data contains categorical variables so the records are selected uniformly at random. Otherwise, shap.kmeans (a wrapper around sklearn k-means implementation) is used for selection. If set to ‘auto’, a default of KERNEL_SHAP_BACKGROUND_THRESHOLD samples is selected.

• n_background_samples (int) – The number of samples to keep in the background dataset if summarise_background=True.

• groups (Optional[List[Union[Tuple[int], List[int]]]]) – A list containing sub-lists specifying the indices of features belonging to the same group.

• group_names (Union[List[str], Tuple[str], None]) – If specified, this array is used to treat groups of features as one during feature perturbation. This feature can be useful, for example, to treat encoded categorical variables as one and can result in computational savings (this may require adjusting the nsamples parameter).

• weights (Union[List[float], Tuple[float], ndarray, None]) – A sequence or array of weights. This is used only if grouping is specified and assigns a weight to each point in the dataset.

• kwargs – Expected keyword arguments include keep_index (bool) and should be used if a data frame containing an index column is passed to the algorithm.

Return type

KernelShap

class alibi.explainers.TreeShap(predictor, model_output='raw', feature_names=None, categorical_names=None, task='classification', seed=None)[source]
__init__(predictor, model_output='raw', feature_names=None, categorical_names=None, task='classification', seed=None)[source]

A wrapper around the shap.TreeExplainer class. It adds the following functionality:

1. Input summarisation options to allow control over background dataset size and hence runtime

2. Output summarisation for sklearn models with one-hot encoded categorical variables.

Users are strongly encouraged to familiarise themselves with the algorithm by reading the method overview in the documentation.

Parameters
• predictor (Any) – A fitted model to be explained. XGBoost, LightGBM, CatBoost and most tree-based scikit-learn models are supported. In the future, Pyspark could also be supported. Please open an issue if this is a use case for you.

• model_output (str) –

Supported values are: ‘raw’, ‘probability’, ‘probability_doubled’, ‘log_loss’:

• ’raw’: the raw model of the output, which varies by task, is explained. This option should always be used if the fit is called without arguments. It should also be set to compute shap interaction values. For regression models it is the standard output, for binary classification in XGBoost it is the log odds ratio.

• ’probability’: the probability output is explained. This option should only be used if fit was was called with the background_data argument set. The effect of specifying this parameter is that the shap library will use this information to transform the shap values computed in margin space (aka using the raw output) to shap values that sum to the probability output by the model plus the model expected output probability. This requires knowledge of the type of output for predictor which is inferred by the shap library from the model type (e.g., most sklearn models with exception of sklearn.tree.DecisionTreeClassifier, sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.ExtraTreesClassifier output logits) or on the basis of the mapping implemented in the shap.TreeEnsemble constructor. Only trees that output log odds and probabilities are supported currently.

• ’probability_doubled’: used for binary classification problem in situations where the model outputs the logits/probabilities for the positive class but shap values for both outcomes are desired. This option should be used only if fit was called with the background_data argument set. In this case the expected value for the negative class is 1 - expected_value for positive class and the shap values for the negative class are the negative values of the positive class shap values. As before, the explanation happens in the margin space, and the shap values are subsequently adjusted. convert the model output to probabilities. The same considerations as for probability apply for this output type too.

• ’log_loss’: logarithmic loss is explained. This option shoud be used only if fit was called with the background_data argument set and requires specifying labels, y, when calling explain. If the objective is squared error, then the transformation $$(output - y)^2$$ is applied. For binary cross-entropy objective, the transformation $$log(1 + exp(output)) - y * output$$ with $$y \in \{0, 1\}$$. Currently only binary cross-entropy and squared error losses can be explained.

• feature_names (Union[List[str], Tuple[str], None]) – Used to compute the names field, which appears as a key in each of the values of the importances sub-field of the response raw field.

• categorical_names (Optional[Dict[int, List[str]]]) – Keys are feature column indices. Each value contains strings with the names of the categories for the feature. Used to select the method for background data summarisation (if specified, subsampling is performed as opposed to kmeans clustering). In the future it may be used for visualisation.

• task (str) – Can have values ‘classification’ and ‘regression’. It is only used to set the contents of the prediction field in the data[‘raw’] response field.

Notes

Tree SHAP is an additive attribution method so it is best suited to explaining output in margin space (the entire real line). For discussion related to explaining models in output vs probability space, please consult this resource.

build_explanation(X, shap_output, expected_value, **kwargs)[source]

Create an explanation object. If output summarisation is required and all inputs necessary for this operation are passed, the raw shap values are summed first so that a single shap value is returned for each categorical variable, as opposed to a shap value per dimension of categorical variable encoding. Similarly, the shap interaction values are summed such that they represent the interaction between categorical variables as opposed to levels of categorical variables. If the interaction option has been specified during explain, this method computes the shap values given the interactions prior to creating the response.

Parameters
• X (Union[ndarray, DataFrame, Pool')]) – Instances to be explained.

• shap_output (List[ndarray]) – If explain is callled with interactions=True then the list contains tensors of dimensionality n_instances x n_features x n_features of shap interaction values. Otherwise, it contains tensors of dimension n_instances x n_features representing shap values. The length of the list equals the number of model outputs.

• expected_value (List[float]) – A list containing the expected value of the prediction for each class. Its length is equal to that of shap_output.

Return type

Explanation

Returns

explanation – An Explanation object containing the shap values and prediction in the data field, along with a meta field containing additional data. See usage examples here for details.

explain(X, y=None, interactions=False, approximate=False, check_additivity=True, tree_limit=None, summarise_result=False, cat_vars_start_idx=None, cat_vars_enc_dim=None, **kwargs)[source]

Explains the instances in X. y should be passed if the model loss function is to be explained, which can be useful in order to understand how various features affect model performance over time. This is only possible if the explainer has been fitted with a background dataset and requires setting model_output=’log_loss’.

Parameters
• X (Union[ndarray, DataFrame, Pool')]) – Instances to be explained.

• y (Optional[ndarray]) – Labels corresponding to rows of X. Should be passed only if a background dataset was passed to the fit method.

• interactions (bool) – If True, the shap value for every feature of every instance in X is decomposed into X.shape[1] - 1 shap value interactions and one main effect. This is only supported if fit is called with background_dataset=None.

• approximate (bool) –

If True, an approximation to the shap values that does not account for feature order is computed. This was proposed by Ando Sabaas here . Check this resource for more details. This option is currently only supported for xgboost and sklearn models.

• check_additivity (bool) – If True, output correctness is ensured if model_output=’raw’ has been passed to the constructor.

• tree_limit (Optional[int]) – Explain the output of a subset of the first tree_limit trees in an ensemble model.

• summarise_result (bool) – This should be set to True only when some of the columns in X represent encoded dimensions of a categorical variable and one single shap value per categorical variable is desired. Both cat_vars_start_idx and cat_vars_enc_dim should be specified as detailed below to allow this.

• cat_vars_start_idx (Optional[Sequence[int]]) – The start indices of the categorical variables.

• cat_vars_enc_dim (Optional[Sequence[int]]) – The length of the encoding dimension for each categorical variable.

Return type

Explanation

fit(background_data=None, summarise_background=False, n_background_samples=1000, **kwargs)[source]

This function instantiates an explainer which can then be use to explain instances using the explain method. If no background dataset is passed, the explainer uses the path-dependent feature perturbation algorithm to explain the values. As such, only the model raw output can be explained and this should be reflected by passing model_output=’raw’ when instantiating the explainer. If a background dataset is passed, the interventional feature perturbation algorithm is used. Using this algorithm, probability outputs can also be explained. Additionally, if the model_output=’log_loss’ option is passed to the explainer constructor, then the model loss function can be explained by passing the labels as the y argument to the explain method. A limited number of loss functions are supported, as detailed in the constructor documentation.

Parameters
• background_data (Union[ndarray, DataFrame, None]) – Data used to estimate feature contributions and baseline values for force plots. The rows of the background data should represent samples and the columns features.

• summarise_background (Union[bool, str]) – A large background dataset may impact the runtime and memory footprint of the algorithm. By setting this argument to True, only n_background_samples from the provided data are selected. If the categorical_names argument has been passed to the constructor, subsampling of the data is used. Otherwise, shap.kmeans (a wrapper around sklearn.kmeans implementation) is used for selection. If set to ‘auto’, a default of TREE_SHAP_BACKGROUND_WARNING_THRESHOLD samples is selected.

• n_background_samples (int) – The number of samples to keep in the background dataset if summarise_background=True.

Return type

TreeShap

alibi.explainers.plot_ale(exp, features='all', targets='all', n_cols=3, sharey='all', constant=False, ax=None, line_kw=None, fig_kw=None)[source]

Plot ALE curves on matplotlib axes.

Parameters
• exp – An Explanation object produced by a call to the ALE.explain method.

• features – A list of features for which to plot the ALE curves or all for all features. Can be a mix of integers denoting feature index or strings denoting entries in exp.feature_names. Defaults to ‘all’.

• targets – A list of targets for which to plot the ALE curves or all for all targets. Can be a mix of integers denoting target index or strings denoting entries in exp.target_names. Defaults to ‘all’.

• n_cols – Number of columns to organize the resulting plot into.

• sharey – A parameter specifying whether the y-axis of the ALE curves should be on the same scale for several features. Possible values are all, row, None.

• constant – A parameter specifying whether the constant zeroth order effects should be added to the ALE first order effects.

• ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

• line_kw – Keyword arguments passed to the plt.plot function.

• fig_kw – Keyword arguments passed to the fig.set function.

Returns

An array of matplotlib axes with the resulting ALE plots.

class alibi.explainers.IntegratedGradients(model, layer=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]
__init__(model, layer=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]

An mplementation of the integrated gradients method for Tensorflow and Keras models.

For details of the method see the original paper: https://arxiv.org/abs/1703.01365 .

Parameters
• model (Union[tensorflow.keras.Model, Model')]) – Tensorflow or Keras model.

• layer (Union[None, tensorflow.keras.layers.Layer, Layer')]) – Layer with respect to which the gradients are calculated. If not provided, the gradients are calculated with respect to the input.

• method (str) – Method for the integral approximation. Methods available: “riemann_left”, “riemann_right”, “riemann_middle”, “riemann_trapezoid”, “gausslegendre”.

• n_steps (int) – Number of step in the path integral approximation from the baseline to the input instance.

• internal_batch_size (Union[None, int]) – Batch size for the internal batching.

Return type

None

build_explanation(X, baselines, target, attributions)[source]
Return type

Explanation

explain(X, baselines=None, target=None)[source]

Calculates the attributions for each input feature or element of layer and returns an Explanation object.

Parameters
• X (ndarray) – Instance for which integrated gradients attribution are computed.

• baselines (Union[None, int, float, ndarray]) – Baselines (starting point of the path integral) for each instance. If the passed value is an np.ndarray must have the same shape as X. If not provided, all features values for the baselines are set to 0.

• target (Union[None, int, list, ndarray]) – Defines which element of the model output is considered to compute the gradients. It can be a list of integers or a numeric value. If a numeric value is passed, the gradients are calculated for the same element of the output for all data points. It must be provided if the model output dimension is higher than 1. For regression models whose output is a scalar, target should not be provided. For classification models target can be either the true classes or the classes predicted by the model.

Return type

Explanation

Returns

• Explanation object including meta and data attributes with integrated gradients attributions

• for each feature.