alibi.explainers package¶
The ‘alibi.explainers’ module includes feature importance, counterfactual and anchorbased explainers.

class
alibi.explainers.
AnchorTabular
(predictor, feature_names, categorical_names=None, seed=None)[source]¶ Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin

__init__
(predictor, feature_names, categorical_names=None, seed=None)[source]¶  Parameters
predictor (
Callable
) – A callable that takes a tensor of N data points as inputs and returns N outputs.feature_names (
list
) – List with feature names.categorical_names (
Optional
[dict
]) – Dictionary where keys are feature columns and values are the categories for the feature.seed (
Optional
[int
]) – Used to set the random number generator for repeatability purposes.
 Return type
None

add_names_to_exp
(explanation)[source]¶ Add feature names to explanation dictionary.
 Parameters
explanation (
dict
) – Dict with anchors and additional metadata. Return type
None

build_explanation
(X, result, predicted_label, params)[source]¶ Preprocess search output and return an explanation object containing metdata
 Parameters
 Return type
 Returns
Dictionary containing human readable explanation, metadata, and precision/coverage info.

explain
(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]¶ Explain prediction made by classifier on instance X.
 Parameters
X (
ndarray
) – Instance to be explained.threshold (
float
) – Minimum precision threshold.delta (
float
) – Used to compute beta.tau (
float
) – Margin between lower confidence bound and minimum precision or upper bound.batch_size (
int
) – Batch size used for sampling.coverage_samples (
int
) – Number of samples used to estimate coverage from during result search.beam_size (
int
) – The number of anchors extended at each step of new anchors construction.stop_on_first (
bool
) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.max_anchor_size (
Optional
[int
]) – Maximum number of features in result.min_samples_start (
int
) – Min number of initial samples.n_covered_ex (
int
) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).binary_cache_size (
int
) – The result search preallocates binary_cache_size batches for storing the binary arrays returned during sampling.cache_margin (
int
) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is preallocated to continue buffering samples.verbose (
bool
) – Display updates during the anchor search iterations.verbose_every (
int
) – Frequency of displayed iterations during anchor search process.
 Return type
 Returns
explanation – Dictionary containing the result explaining the instance with additional metadata.

fit
(train_data, disc_perc=(25, 50, 75), **kwargs)[source]¶ Fit discretizer to train data to bin numerical features into ordered bins and compute statistics for numerical features. Create a mapping between the bin numbers of each discretised numerical feature and the row id in the training set where it occurs.
 Parameters
 Return type


class
alibi.explainers.
DistributedAnchorTabular
(predictor, feature_names, categorical_names=None, seed=None)[source]¶ Bases:
alibi.explainers.anchor_tabular.AnchorTabular

explain
(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=1, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]¶ Explains the prediction made by a classifier on instance X. Sampling is done in parallel over a number of cores specified in kwargs[‘ncpu’].
 Parameters
superclass implementation. (See) –
 Return type
 Returns
See superclass implementation.


class
alibi.explainers.
AnchorText
(nlp, predictor, seed=None)[source]¶ Bases:
alibi.api.interfaces.Explainer

UNK
= 'UNK'¶

build_explanation
(text, result, predicted_label, params)[source]¶ Uses the metadata returned by the anchor search algorithm together with the instance to be explained to build an explanation object.
 Parameters
 Return type

compare_labels
(samples)[source]¶ Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of features fixed to a given value (aka compute the precision of anchors).
 Parameters
samples (
ndarray
) – Samples whose labels are to be compared with the instance label. Return type
ndarray
 Returns
A boolean array indicating whether the prediction was the same as the instance label.

explain
(text, use_unk=True, use_similarity_proba=False, sample_proba=0.5, top_n=100, temperature=1.0, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=True, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]¶ Explain instance and return anchor with metadata.
 Parameters
text (
str
) – Text instance to be explained.use_unk (
bool
) – If True, perturbation distribution will replace words randomly with UNKs. If False, words will be replaced by similar words using word embeddings.use_similarity_proba (
bool
) – Sample according to a similarity score with the corpus embeddings use_unk needs to be False in order for this to be used.sample_proba (
float
) – Sample probability if use_similarity_proba is False.top_n (
int
) – Number of similar words to sample for perturbations, only used if use_proba=True.temperature (
float
) – Sample weight hyperparameter if use_similarity_proba equals True.threshold (
float
) – Minimum precision threshold.delta (
float
) – Used to compute beta.tau (
float
) – Margin between lower confidence bound and minimum precision or upper bound.batch_size (
int
) – Batch size used for sampling.coverage_samples (
int
) – Number of samples used to estimate coverage from during anchor search.beam_size (
int
) – Number of options kept after each stage of anchor building.stop_on_first (
bool
) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.max_anchor_size (
Optional
[int
]) – Maximum number of features to include in an anchor.min_samples_start (
int
) – Number of samples used for anchor search initialisation.n_covered_ex (
int
) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with predicted label are stored).binary_cache_size (
int
) – The anchor search preallocates binary_cache_size batches for storing the boolean arrays returned during sampling.cache_margin (
int
) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is preallocated to continue buffering samples.kwargs (
Any
) – Other keyword arguments passed to the anchor beam search and the text sampling and perturbation functions.verbose (
bool
) – Display updates during the anchor search iterations.verbose_every (
int
) – Frequency of displayed iterations during anchor search process.
 Return type
 Returns
explanation – Dictionary containing the anchor explaining the instance with additional metadata.

find_similar_words
()[source]¶ This function queries a spaCy nlp model to find n similar words with the same part of speech for each word in the instance to be explained. For each word the search procedure returns a dictionary containing an np.array of words (‘words’) and an np.array of word similarities (‘similarities’).
 Return type
None

perturb_sentence
(present, n, sample_proba=0.5, forbidden=frozenset({}), forbidden_tags=frozenset({'PRP$'}), forbidden_words=frozenset({'be'}), temperature=1.0, pos=frozenset({'ADJ', 'ADP', 'ADV', 'DET', 'NOUN', 'VERB'}), use_similarity_proba=True, **kwargs)[source]¶ Perturb the text instance to be explained.
 Parameters
present (
tuple
) – Word index in the text for the words in the proposed anchor.n (
int
) – Number of samples used when sampling from the corpus.sample_proba (
float
) – Sample probability for a word if use_similarity_proba is False.forbidden (
frozenset
) – Forbidden lemmas.forbidden_tags (
frozenset
) – Forbidden POS tags.forbidden_words (
frozenset
) – Forbidden words.pos (
frozenset
) – POS that can be changed during perturbation.use_similarity_proba (
bool
) – Bool whether to sample according to a similarity score with the corpus embeddings.temperature (
float
) – Sample weight hyperparameter if use_similarity_proba equals True.
 Return type
Tuple
[ndarray
,ndarray
] Returns
raw_data – Array of perturbed text instances.
data – Matrix with 1s and 0s indicating whether a word in the text has not been perturbed for each sample.

sampler
(anchor, num_samples, compute_labels=True)[source]¶ Generate perturbed samples while maintaining features in positions specified in anchor unchanged.
 Parameters
anchor (
Tuple
[int
,tuple
]) – int: the position of the anchor in the input batch tuple: the anchor itself, a list of words to be kept unchangednum_samples (
int
) – Number of generated perturbed samples.compute_labels (
bool
) – If True, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.
 Return type
 Returns
If compute_labels=True, a list containing the following is returned –
 covered_true: perturbed examples where the anchor applies and the model prediction
on perturbation is the same as the instance prediction
 covered_false: perturbed examples where the anchor applies and the model prediction
is NOT the same as the instance prediction
 labels: num_samples ints indicating whether the prediction on the perturbed sample
matches (1) the label of the instance to be explained or not (0)
 data: Matrix with 1s and 0s indicating whether a word in the text has been
perturbed for each sample
1.0: indicates exact coverage is not computed for this algorithm
anchor[0]: position of anchor in the batch request
Otherwise, a list containing the data matrix only is returned.

set_data_type
(use_unk)[source]¶ Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.
 Parameters
use_unk (
bool
) – See explain method. Return type
None

set_sampler_perturbation
(use_unk, perturb_opts)[source]¶ Initialises the explainer by setting the perturbation function and parameters necessary to sample according to the perturbation method.
 Parameters
use_unk (
bool
) – see explain methodperturb_opts (
dict
) – A dict with keys:
’top_n’: the max number of alternatives to sample from for replacement ‘use_similarity_proba’: if True the probability of selecting a replacement
word is prop. to the similarity between the word and the word to be replaced
 ’sample_proba’: given a feature and n sentences, this parameters is the mean of a
Bernoulli distribution used to decide how many sentences will have that feature perturbed
 ’temperature’: a tempature used to callibrate the softmax distribution over the
sampling weights.
 Return type
None


class
alibi.explainers.
AnchorImage
(predictor, image_shape, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]¶ Bases:
alibi.api.interfaces.Explainer

__init__
(predictor, image_shape, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]¶ Initialize anchor image explainer.
 Parameters
predictor (
Callable
) – A callable that takes a tensor of N data points as inputs and returns N outputs.image_shape (
tuple
) – Shape of the image to be explained.segmentation_fn (
Any
) – Any of the built in segmentation function strings: ‘felzenszwalb’, ‘slic’ or ‘quickshift’ or a custom segmentation function (callable) which returns an image mask with labels for each superpixel. See http://scikitimage.org/docs/dev/api/skimage.segmentation.html for more info.segmentation_kwargs (
Optional
[dict
]) – Keyword arguments for the built in segmentation functions.images_background (
Optional
[ndarray
]) – Images to overlay superpixels on.seed (
Optional
[int
]) – If set, ensures different runs with the same input will yield same explanation.
 Return type
None

build_explanation
(image, result, predicted_label, params)[source]¶ Uses the metadata returned by the anchor search algorithm together with the instance to be explained to build an explanation object.
 Parameters
 Return type

compare_labels
(samples)[source]¶ Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of perturbed superpixels.
 Parameters
samples (
ndarray
) – Samples whose labels are to be compared with the instance label. Return type
ndarray
 Returns
A boolean array indicating whether the prediction was the same as the instance label.

explain
(image, p_sample=0.5, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]¶ Explain instance and return anchor with metadata.
 Parameters
image (
ndarray
) – Image to be explained.p_sample (
float
) – Probability for a pixel to be represented by the average value of its superpixel.threshold (
float
) – Minimum precision threshold.delta (
float
) – Used to compute beta.tau (
float
) – Margin between lower confidence bound and minimum precision of upper bound.batch_size (
int
) – Batch size used for sampling.coverage_samples (
int
) – Number of samples used to estimate coverage from during result search.beam_size (
int
) – The number of anchors extended at each step of new anchors construction.stop_on_first (
bool
) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.max_anchor_size (
Optional
[int
]) – Maximum number of features in result.min_samples_start (
int
) – Min number of initial samples.n_covered_ex (
int
) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).binary_cache_size (
int
) – The result search preallocates binary_cache_size batches for storing the binary arrays returned during sampling.cache_margin (
int
) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is preallocated to continue buffering samples.verbose (
bool
) – Display updates during the anchor search iterations.verbose_every (
int
) – Frequency of displayed iterations during anchor search process.
 Return type
 Returns
explanation – Dictionary containing the anchor explaining the instance with additional metadata.

generate_superpixels
(image)[source]¶ Generates superpixels from (i.e., segments) an image.
 Parameters
image (
ndarray
) – A grayscale or RGB image. Return type
ndarray
 Returns
A [H, W] array of integers. Each integer is a segment (superpixel) label.

overlay_mask
(image, segments, mask_features, scale=(0, 255))[source]¶ Overlay image with mask described by the mask features.

perturbation
(anchor, num_samples)[source]¶ Perturbs an image by altering the values of selected superpixels. If a dataset of image backgrounds is provided to the explainer, then the superpixels are replaced with the equivalent superpixels from the background image. Otherwise, the superpixels are replaced by their average value.
 Parameters
 Return type
Tuple
[ndarray
,ndarray
] Returns
imgs – A [num_samples, H, W, C] array of perturbed images.
segments_mask – A [num_samples, M] binary mask, where M is the number of image superpixels segments. 1 indicates the values in that particular superpixels are not perturbed.

sampler
(anchor, num_samples, compute_labels=True)[source]¶ Sample images from a perturbation distribution by masking randomly chosen superpixels from the original image and replacing them with pixel values from superimposed images if background images are provided to the explainer. Otherwise, the superpixels from the original image are replaced with their average values.
 Parameters
anchor (
Tuple
[int
,tuple
]) – int: order of anchor in the batch tuple: features (= superpixels) present in the proposed anchornum_samples (
int
) – Number of samples usedcompute_labels (
bool
) – If True, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.
 Return type
 Returns
If compute_labels=True, a list containing the following is returned –
 covered_true: perturbed examples where the anchor applies and the model prediction
on perturbed is the same as the instance prediction
 covered_false: perturbed examples where the anchor applies and the model prediction
on pertrurbed sample is NOT the same as the instance prediction
 labels: num_samples ints indicating whether the prediction on the perturbed sample
matches (1) the label of the instance to be explained or not (0)
 data: Matrix with 1s and 0s indicating whether the values in a superpixel will
remain unchanged (1) or will be perturbed (0), for each sample
1.0: indicates exact coverage is not computed for this algorithm
anchor[0]: position of anchor in the batch request
Otherwise, a list containing the data matrix only is returned.


class
alibi.explainers.
CEM
(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶ Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin

__init__
(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶ Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623
 Parameters
predict (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesmode (
str
) – Find pertinant negatives (‘PN’) or pertinant positives (‘PP’)shape (
tuple
) – Shape of input data starting with batch sizekappa (
float
) – Confidence parameter for the attack loss termbeta (
float
) – Regularization constant for L1 loss termfeature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for featurewise rangesgamma (
float
) – Regularization constant for optional autoencoder loss termae_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional autoencoder model used for loss regularizationlearning_rate_init (
float
) – Initial learning rate of optimizermax_iterations (
int
) – Maximum number of iterations for finding a PN or PPc_init (
float
) – Initial value to scale the attack loss termc_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss termeps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graphupdate_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterationsno_info_val (
Union
[float
,ndarray
]) – Global or featurewise value considered as containing no informationwrite_dir (
str
) – Directory to write tensorboard files tosess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
 Return type

attack
(X, Y, verbose=False)[source]¶ Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkagethresholding algorithm (FISTA).

explain
(X, Y=None, verbose=False)[source]¶ Explain instance and return PP or PN with metadata.
 Parameters
 Return type
 Returns
explanation – Dictionary containing the PP or PN with additional metadata

get_gradients
(X, Y)[source]¶ Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
 Parameters
X (
ndarray
) – Instance around which gradient is evaluatedY (
ndarray
) – Onehot representation of instance labels
 Return type
ndarray
 Returns
Array with gradients.

loss_fn
(pred_proba, Y)[source]¶ Compute the attack loss.
 Parameters
pred_proba (
ndarray
) – Prediction probabilities of an instanceY (
ndarray
) – Onehot representation of instance labels
 Return type
ndarray
 Returns
Loss of the attack.


class
alibi.explainers.
CounterFactual
(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]¶ Bases:
alibi.api.interfaces.Explainer

__init__
(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]¶ Initialize counterfactual explanation method based on Wachter et al. (2017)
 Parameters
predict_fn (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesshape (
Tuple
[int
, …]) – Shape of input data starting with batch sizedistance_fn (
str
) – Distance function to use in the loss termtarget_proba (
float
) – Target probability for the counterfactual to reachtarget_class (
Union
[str
,int
]) – Target class for the counterfactual to reach, one of ‘other’, ‘same’ or an integer denoting desired class membership for the counterfactual instancemax_iter (
int
) – Maximum number of interations to run the gradient descent for (inner loop)early_stop (
int
) – Number of steps after which to terminate gradient descent if all or none of found instances are solutionslam_init (
float
) – Initial regularization constant for the prediction part of the Wachter lossmax_lam_steps (
int
) – Maximum number of times to adjust the regularization constant (outer loop) before terminating the searchtol (
float
) – Tolerance for the counterfactual target probabilitylearning_rate_init – Initial learning rate for each outer loop of lambda
feature_range (
Union
[Tuple
,str
]) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1 x nb of features) for featurewise rangeseps (
Union
[float
,ndarray
]) – Gradient step sizes used in calculating numerical gradients, defaults to a single value for all features, but can be passed an array for featurewise step sizesinit (
str
) – Initialization method for the search of counterfactuals, currently must be ‘identity’decay (
bool
) – Flag to decay learning rate to zero for each outer loop over lambdawrite_dir (
str
) – Directory to write Tensorboard files todebug (
bool
) – Flag to write Tensorboard summaries for debuggingsess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
 Return type

explain
(X)[source]¶ Explain an instance and return the counterfactual with metadata.
 Parameters
X (
ndarray
) – Instance to be explained Return type
 Returns
*explanation  a dictionary containing the counterfactual with additional metadata.*


class
alibi.explainers.
CounterFactualProto
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin

__init__
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Initialize prototypical counterfactual method.
 Parameters
predict (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesshape (
tuple
) – Shape of input data starting with batch sizekappa (
float
) – Confidence parameter for the attack loss termbeta (
float
) – Regularization constant for L1 loss termfeature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for featurewise rangesgamma (
float
) – Regularization constant for optional autoencoder loss termae_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional autoencoder model used for loss regularizationenc_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional encoder model used to guide instance perturbations towards a class prototypetheta (
float
) – Constant for the prototype search loss termcat_vars (
dict
) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.ohe (
bool
) – Whether the categorical variables are onehot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.use_kdtree (
bool
) – Whether to use kd trees for the prototype loss term if no encoder is availablelearning_rate_init (
float
) – Initial learning rate of optimizermax_iterations (
int
) – Maximum number of iterations for finding a counterfactualc_init (
float
) – Initial value to scale the attack loss termc_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss termeps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graphupdate_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterationswrite_dir (
str
) – Directory to write tensorboard files tosess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
 Return type

attack
(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Find a counterfactual (CF) for instance X using a fast iterative shrinkagethresholding algorithm (FISTA).
 Parameters
X (
ndarray
) – Instance to attackY (
ndarray
) – Labels for X as onehotencodingtarget_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
Overall best attack and gradients for that attack.

explain
(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Explain instance and return counterfactual with metadata.
 Parameters
X (
ndarray
) – Instances to attackY (
Optional
[ndarray
]) – Labels for X as onehotencodingtarget_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
explanation – Dictionary containing the counterfactual with additional metadata

fit
(train_data, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=(25, 50, 75), standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)[source]¶ Get prototypes for each class using the encoder or kd trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.
 Parameters
train_data (
ndarray
) – Representative sample from the training data.trustscore_kwargs (
Optional
[dict
]) – Optional arguments to initialize the trust scores method.d_type (
str
) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdmmvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdmmvdm’ is a weighted combination of the two metrics.w (
Optional
[float
]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdmmvdm’.disc_perc (
Sequence
[Union
[int
,float
]]) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdmmvdm’ pairwise distance measures.standardize_cat_vars (
bool
) – Standardize numerical values of categorical variables if True.smooth (
float
) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.center (
bool
) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.update_feature_range (
bool
) – Update feature range with scaled values.
 Return type

get_gradients
(X, Y, grads_shape, cat_vars_ord=None)[source]¶ Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
 Parameters
 Return type
ndarray
 Returns
Array with gradients.

loss_fn
(pred_proba, Y)[source]¶ Compute the attack loss.
 Parameters
pred_proba (
ndarray
) – Prediction probabilities of an instanceY (
ndarray
) – Onehot representation of instance labels
 Return type
ndarray
 Returns
Loss of the attack.


class
alibi.explainers.
KernelShap
(predictor, link='identity', feature_names=None, categorical_names=None, seed=None)[source]¶ Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin

__init__
(predictor, link='identity', feature_names=None, categorical_names=None, seed=None)[source]¶ A wrapper around the shap.KernelExplainer class. This extends the current shap library functionality by allowing the user to specify variable groups in order to deal with onehot encoded categorical variables. The user can also specify whether to aggregate the shap values estimate for the encoded levels of categorical variables during the explain call.
 Parameters
predictor (
Callable
) – A callable that takes as an input a samples x features array and outputs a samples x n_outputs outputs. The n_outputs should represent model output in margin space. If the model outputs probabilities, then the link should be set to ‘logit’ to ensure correct force plots.link (
str
) – Valid values are ‘identity’ or ‘logit’. A generalized linear model link to connect the feature importance values to the model output. Since the feature importance values, phi, sum up to the model output, it often makes sense to connect them to the ouput with a link function where link(output) = sum(phi). If the model output is a probability then the LogitLink link function makes the feature importance values have logodds units. Therefore, for a model which outputs probabilities, link=’logit’ makes the feature effects have logodds (evidence) units where link=’identity’ means that the feature effects have probability units. Please see https://github.com/slundberg/shap/blob/master/notebooks/kernel_explainer/Squashing%20Effect.ipynb for an indepth discussion about the semantics of explaining the model in the probability vs the margin space.feature_names (
Union
[List
,Tuple
,None
]) – List with feature names.categorical_names (
Optional
[Dict
]) – Dictionary where keys are feature columns and values are list of categories for the feature.seed (
Optional
[int
]) – Fixes the random number stream, which influences which subsets are sampled during shap value estimation

build_explanation
(X, shap_values, expected_value)[source]¶ Create an explanation object.
 Parameters
X (
Union
[ndarray
,DataFrame
,spmatrix
]) – Array of instances to be explained.shap_values (
List
[ndarray
]) – Each entry is a n_instances x n_features array, and the length of the list equals the dimensionality of the predictor output. The rows of each array correspond to the shap values for the instances with the corresponding row index in Xexpected_value (
List
) – A list containing the expected value of the prediction for each class.
 Return type
 Returns
An explanation containing a meta field with basic classifier metadata
# TODO (Plotting default should be same space as the explanation? How do we figure out what space they)
# explain in?

explain
(X, summarise_result=False, cat_vars_start_idx=None, cat_vars_enc_dim=None, **kwargs)[source]¶ Explains the instances in the array X.
 Parameters
X (
Union
[ndarray
,DataFrame
,spmatrix
]) – Array with instances to be explained.summarise_result (
bool
) – Specifies whether the shap values corresponding to dimensions of encoded categorical variables should be summed so that a single shap value is returned for each categorical variable. Both the start indices of the categorical variables (cat_vars_start_idx) and the encoding dimensions (cat_vars_enc_dim) have to be specifiedcat_vars_start_idx (
Optional
[List
[int
]]) – A sequence containing the start indices of the categorical variables. If specified, cat_vars_enc_dim should also be specified.cat_vars_enc_dim (
Optional
[List
[int
]]) – A sequence containing the length of the encoding dimension for each categorical variable.kwargs –
 Keyword arguments specifying explain behaviour. Valid arguments are:
*nsamples: controls the number of predictor calls and therefore runtime. *l1_reg: controls the explanation sparsity.
For more details, please see https://shap.readthedocs.io/en/latest/.
 Return type
 Returns
explanation – An explanation object containing the algorithm results.

fit
(background_data, summarise_background=False, n_background_samples=300, group_names=None, groups=None, weights=None, **kwargs)[source]¶ This takes a background dataset (usually a subsample of the training set) as an input along with several user specified options and initialises a KernelShap explainer. The runtime of the algorithm depends on the number of samples in this dataset and on the number of features in the dataset. To reduce the size of the dataset, the summarise_background option and n_background_samples should be used. To reduce the feature dimensionality, encoded categorical variables can be treated as one during the feature perturbation process; this decreases the effective feature dimensionality, can reduce the variance of the shap values estimation and reduces slightly the number of calls to the predictor. Further runtime savings can be achieved by changing the nsamples parameter in the call to explain. Runtime reduction comes with an accuracy tradeoff, so it is better to experiment with a runtime reduction method and understand results stability before using the system.
 Parameters
background_data (
Union
[ndarray
,spmatrix
,DataFrame
,Data
]) – Data used to estimate feature contributions and baseline values for force plots. The rows of the background data should represent samples and the columns features.summarise_background (
Union
[bool
,str
]) – A large background dataset impacts the runtime and memory footprint of the algorithm. By setting this argument to True, only n_background_samples from the provided data are selected. If group_names or groups arguments are specified, the algorithm assumes that the data contains categorical variables so the records are selected uniformly at random. Otherwise, shap.kmeans (a wrapper around sklearn kmeans implementation) is used for selection. If set to ‘auto’, a default of BACKGROUND_WARNING_THRESHOLD samples is selected.n_background_samples (
int
) – The number of samples to keep in the background dataset if summarise_background=True.groups (
Optional
[List
[Union
[Tuple
[int
],List
[int
]]]]) – A list containing sublists specifying the indices of features belonging to the same group.group_names (
Union
[List
,Tuple
,None
]) – If specified, this array is used to treat groups of features as one during feature perturbation. This feature can be useful, for example, to treat encoded categorical variables as one and can result in computational savings (this may require adjusting the nsamples parameter).weights (
Union
[List
[float
],Tuple
[float
],ndarray
,None
]) – A sequence or array of weights. This is used only if grouping is specified and assigns a weight to each point in the dataset.kwargs – Expected keyword arguments include “keep_index” and should be used if a data frame containing an index column is passed to the algorithm.
 Return type

rank_by_importance
(shap_values)[source]¶ Given the shap values estimated for a multioutput model, this feature ranks features according to their importance. The feature importance is the average absolute value for a given feature.
 Parameters
shap_values (
List
[ndarray
]) – Each element corresponds to a samples x features array of shap values corresponding to each model output. Return type
 Returns
importances – A dictionary containing a key for each model output (‘0’, ‘1’, …) and a key for aggregated model output (‘aggregated’). Each value is a dictionary contains a ‘ranked_effect’ field, populated with an array of values representing the average magnitude for each shap value, ordered from highest (most important) to the lowest (least important) feature. The ‘names’ field contains the corresponding feature names.

Submodules¶
 alibi.explainers.anchor_base module
 alibi.explainers.anchor_explanation module
 alibi.explainers.anchor_image module
 alibi.explainers.anchor_tabular module
 alibi.explainers.anchor_text module
 alibi.explainers.cem module
 alibi.explainers.cfproto module
 alibi.explainers.counterfactual module
 alibi.explainers.kernel_shap module