# alibi.explainers package

The ‘alibi.explainers’ module includes feature importance, counterfactual and anchor-based explainers.

class alibi.explainers.ALE(predictor, feature_names=None, target_names=None, check_feature_resolution=True, low_resolution_threshold=10, extrapolate_constant=True, extrapolate_constant_perc=10.0, extrapolate_constant_min=0.1)[source]

Bases: Explainer

__init__(predictor, feature_names=None, target_names=None, check_feature_resolution=True, low_resolution_threshold=10, extrapolate_constant=True, extrapolate_constant_perc=10.0, extrapolate_constant_min=0.1)[source]

Accumulated Local Effects for tabular datasets. Current implementation supports first order feature effects of numerical features.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes in an N x F array as input and outputs an N x T array (N - number of data points, F - number of features, T - number of outputs/targets (e.g. 1 for single output regression, >=2 for classification)).

• feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

• target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

• check_feature_resolution (bool) – If True, the number of unique values is calculated for each feature and if it is less than low_resolution_threshold then the feature values are used for grid-points instead of quantiles. This may increase the runtime of the algorithm for large datasets. Only used for features without custom grid-points specified in alibi.explainers.ale.ALE.explain().

• low_resolution_threshold (int) – If a feature has at most this many unique values, these are used as the grid points instead of quantiles. This is to avoid situations when the quantile algorithm returns quantiles between discrete values which can result in jumps in the ALE plot obscuring the true effect. Only used if check_feature_resolution is True and for features without custom grid-points specified in alibi.explainers.ale.ALE.explain().

• extrapolate_constant (bool) – If a feature is constant, only one quantile exists where all the data points lie. In this case the ALE value at that point is zero, however this may be misleading if the feature does have an effect on the model. If this parameter is set to True, the ALE values are calculated on an interval surrounding the constant value. The interval length is controlled by the extrapolate_constant_perc and extrapolate_constant_min arguments.

• extrapolate_constant_perc (float) – Percentage by which to extrapolate a constant feature value to create an interval for ALE calculation. If q is the constant feature value, creates an interval [q - q/extrapolate_constant_perc, q + q/extrapolate_constant_perc] for which ALE is calculated. Only relevant if extrapolate_constant is set to True.

• extrapolate_constant_min (float) – Controls the minimum extrapolation length for constant features. An interval constructed for constant features is guaranteed to be 2 x extrapolate_constant_min wide centered on the feature value. This allows for capturing model behaviour around constant features which have small value so that extrapolate_constant_perc is not so helpful. Only relevant if extrapolate_constant is set to True.

explain(X, features=None, min_bin_points=4, grid_points=None)[source]

Calculate the ALE curves for each feature with respect to the dataset X.

Parameters
• X (ndarray) – An N x F tabular dataset used to calculate the ALE curves. This is typically the training dataset or a representative sample.

• features (Optional[List[int]]) – Features for which to calculate ALE.

• min_bin_points (int) – Minimum number of points each discretized interval should contain to ensure more precise ALE estimation. Only relevant for adaptive grid points (i.e., features without an entry in the grid_points dictionary).

• grid_points (Optional[Dict[int, ndarray]]) – Custom grid points. Must be a dict where the keys are features indices and the values are monotonically increasing numpy arrays defining the grid points for each feature. See the Notes section for the default behavior when potential edge-cases arise when using grid-points. If no grid points are specified (i.e. the feature is missing from the grid_points dictionary), deciles discretization is used instead.

Return type

Explanation

Returns

explanation – An Explanation object containing the data and the metadata of the calculated ALE curves. See usage at ALE examples for details.

Notes

Consider f to be a feature of interest. We denote possible feature values of f by X (i.e. the values from the dataset column corresponding to feature f), by O a user-specified grid-point value, and by (X|O) an overlap between a grid-point and a feature value. We can encounter the following edge-cases:

• Grid points outside the feature range. Consider the following example: O O O X X O X O X O O, where 3 grid-points are smaller than the minimum value in f, and 2 grid-points are larger than the maximum value in f. The empty leading and ending bins are removed. The grid-points considered

will be: O X X O X O X O.

• Grid points that do not cover the entire feature range. Consider the following example: X X O X X O X O X X X X X. Two auxiliary grid-points are added which correspond the value of the minimum and maximum value of feature f. The grid-points considered will be: (O|X) X O X X O X O X X X X (X|O).

• Grid points that do not contain any values in between. Consider the following example: (O|X) X X O O O X O X O O (X|O). The intervals which do not contain any feature values are removed/merged. The grid-points considered will be: (O|X) X X O X O X O (X|O).

meta: dict

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New predictor function.

Return type

None

class alibi.explainers.AnchorImage(predictor, image_shape, dtype=<class 'numpy.float32'>, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]

Bases: Explainer

__init__(predictor, image_shape, dtype=<class 'numpy.float32'>, segmentation_fn='slic', segmentation_kwargs=None, images_background=None, seed=None)[source]

Initialize anchor image explainer.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes a numpy array of N data points as inputs and returns N outputs.

• image_shape (tuple) – Shape of the image to be explained. The channel axis is expected to be last.

• dtype (Type[generic]) – A numpy scalar type that corresponds to the type of input array expected by predictor. This may be used to construct arrays of the given type to be passed through the predictor. For most use cases this argument should have no effect, but it is exposed for use with predictors that would break when called with an array of unsupported type.

• segmentation_fn (Any) – Any of the built in segmentation function strings: 'felzenszwalb', 'slic' or 'quickshift' or a custom segmentation function (callable) which returns an image mask with labels for each superpixel. The segmentation function is expected to return a segmentation mask containing all integer values from 0 to K-1, where K is the number of image segments (superpixels). See http://scikit-image.org/docs/dev/api/skimage.segmentation.html for more info.

• segmentation_kwargs (Optional[dict]) – Keyword arguments for the built in segmentation functions.

• images_background (Optional[ndarray]) – Images to overlay superpixels on.

• seed (Optional[int]) – If set, ensures different runs with the same input will yield same explanation.

Raises
explain(image, p_sample=0.5, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain instance and return anchor with metadata.

Parameters
• image (ndarray) – Image to be explained.

• p_sample (float) – The probability of simulating the absence of a superpixel. If the images_background is not provided, the absent superpixels will be replaced by the average value of their constituent pixels. Otherwise, the synthetic instances are created by fixing the present superpixels and superimposing another image from the images_background over the rest of the absent superpixels.

• threshold (float) – Minimum anchor precision threshold. The algorithm tries to find an anchor that maximizes the coverage under precision constraint. The precision constraint is formally defined as $$P(prec(A) \ge t) \ge 1 - \delta$$, where $$A$$ is an anchor, $$t$$ is the threshold parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, we are seeking for an anchor having its precision greater or equal than the given threshold with a confidence of (1 - delta). A higher value guarantees that the anchors are faithful to the model, but also leads to more computation time. Note that there are cases in which the precision constraint cannot be satisfied due to the quantile-based discretisation of the numerical features. If that is the case, the best (i.e. highest coverage) non-eligible anchor is returned.

• delta (float) – Significance threshold. 1 - delta represents the confidence threshold for the anchor precision (see threshold) and the selection of the best anchor candidate in each iteration (see tau).

• tau (float) – Multi-armed bandit parameter used to select candidate anchors in each iteration. The multi-armed bandit algorithm tries to find within a tolerance tau the most promising (i.e. according to the precision) beam_size candidate anchor(s) from a list of proposed anchors. Formally, when the beam_size=1, the multi-armed bandit algorithm seeks to find an anchor $$A$$ such that $$P(prec(A) \ge prec(A^\star) - \tau) \ge 1 - \delta$$, where $$A^\star$$ is the anchor with the highest true precision (which we don’t know), $$\tau$$ is the tau parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, in each iteration, the algorithm returns with a probability of at least 1 - delta an anchor $$A$$ with a precision within an error tolerance of tau from the precision of the highest true precision anchor $$A^\star$$. A bigger value for tau means faster convergence but also looser anchor conditions.

• batch_size (int) – Batch size used for sampling. The Anchor algorithm will query the black-box model in batches of size batch_size. A larger batch_size gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls.

• coverage_samples (int) – Number of samples used to estimate coverage from during result search.

• beam_size (int) – Number of candidate anchors selected by the multi-armed bandit algorithm in each iteration from a list of proposed anchors. A bigger beam width can lead to a better overall anchor (i.e. prevents the algorithm of getting stuck in a local maximum) at the expense of more computation time.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features in result.

• min_samples_start (int) – Min number of initial samples.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).

• binary_cache_size (int) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanationExplanation object containing the anchor explaining the instance with additional metadata as attributes. See usage at AnchorImage examples for details.

generate_superpixels(image)[source]

Generates superpixels from (i.e., segments) an image.

Parameters

image (ndarray) – A grayscale or RGB image.

Return type

ndarray

Returns

A [H, W] array of integers. Each integer is a segment (superpixel) label.

meta: dict

Parameters
• image (ndarray) – Image to be explained.

• segments (ndarray) – Superpixels.

• scale (tuple) – Pixel scale for masked image.

Return type

ndarray

Returns

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New predictor function.

Return type

None

class alibi.explainers.AnchorTabular(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]

Bases: Explainer, FitMixin

__init__(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]
Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes a numpy array of N data points as inputs and returns N outputs.

• feature_names (List[str]) – List with feature names.

• categorical_names (Optional[Dict[int, List[str]]]) – Dictionary where keys are feature columns and values are the categories for the feature.

• dtype (Type[generic]) – A numpy scalar type that corresponds to the type of input array expected by predictor. This may be used to construct arrays of the given type to be passed through the predictor. For most use cases this argument should have no effect, but it is exposed for use with predictors that would break when called with an array of unsupported type.

• ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

• seed (Optional[int]) – Used to set the random number generator for repeatability purposes.

Raises

Add feature names to explanation dictionary.

Parameters

Return type

None

explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain prediction made by classifier on instance X.

Parameters
• X (ndarray) – Instance to be explained.

• threshold (float) – Minimum anchor precision threshold. The algorithm tries to find an anchor that maximizes the coverage under precision constraint. The precision constraint is formally defined as $$P(prec(A) \ge t) \ge 1 - \delta$$, where $$A$$ is an anchor, $$t$$ is the threshold parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, we are seeking for an anchor having its precision greater or equal than the given threshold with a confidence of (1 - delta). A higher value guarantees that the anchors are faithful to the model, but also leads to more computation time. Note that there are cases in which the precision constraint cannot be satisfied due to the quantile-based discretisation of the numerical features. If that is the case, the best (i.e. highest coverage) non-eligible anchor is returned.

• delta (float) – Significance threshold. 1 - delta represents the confidence threshold for the anchor precision (see threshold) and the selection of the best anchor candidate in each iteration (see tau).

• tau (float) – Multi-armed bandit parameter used to select candidate anchors in each iteration. The multi-armed bandit algorithm tries to find within a tolerance tau the most promising (i.e. according to the precision) beam_size candidate anchor(s) from a list of proposed anchors. Formally, when the beam_size=1, the multi-armed bandit algorithm seeks to find an anchor $$A$$ such that $$P(prec(A) \ge prec(A^\star) - \tau) \ge 1 - \delta$$, where $$A^\star$$ is the anchor with the highest true precision (which we don’t know), $$\tau$$ is the tau parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, in each iteration, the algorithm returns with a probability of at least 1 - delta an anchor $$A$$ with a precision within an error tolerance of tau from the precision of the highest true precision anchor $$A^\star$$. A bigger value for tau means faster convergence but also looser anchor conditions.

• batch_size (int) – Batch size used for sampling. The Anchor algorithm will query the black-box model in batches of size batch_size. A larger batch_size gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls.

• coverage_samples (int) – Number of samples used to estimate coverage from during result search.

• beam_size (int) – Number of candidate anchors selected by the multi-armed bandit algorithm in each iteration from a list of proposed anchors. A bigger beam width can lead to a better overall anchor (i.e. prevents the algorithm of getting stuck in a local maximum) at the expense of more computation time.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features in result.

• min_samples_start (int) – Min number of initial samples.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).

• binary_cache_size (int) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanationExplanation object containing the result explaining the instance with additional metadata as attributes. See usage at AnchorTabular examples for details.

Raises

alibi.exceptions.NotFittedError – If fit has not been called prior to calling explain.

fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Fit discretizer to train data to bin numerical features into ordered bins and compute statistics for numerical features. Create a mapping between the bin numbers of each discretised numerical feature and the row id in the training set where it occurs.

Parameters
• train_data (ndarray) – Representative sample from the training data.

• disc_perc (Tuple[Union[int, float], ...]) – List with percentiles (int) used for discretization.

Return type

AnchorTabular

instance_label: int

The label of the instance to be explained.

property predictor: Optional[Callable]
Return type
reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New predictor function.

Return type

None

class alibi.explainers.AnchorText(predictor, sampling_strategy='unknown', nlp=None, language_model=None, seed=0, **kwargs)[source]

Bases: Explainer

CLASS_SAMPLER = {'language_model': <class 'alibi.explainers.anchors.language_model_text_sampler.LanguageModelSampler'>, 'similarity': <class 'alibi.explainers.anchors.text_samplers.SimilaritySampler'>, 'unknown': <class 'alibi.explainers.anchors.text_samplers.UnknownSampler'>}
DEFAULTS: Dict[str, Dict] = {'language_model': {'batch_size_lm': 32, 'filling': 'parallel', 'frac_mask_templates': 0.1, 'punctuation': '!"#\$%&\'()*+,-./:;<=>?@[\\]^_{|}~', 'sample_proba': 0.5, 'sample_punctuation': False, 'stopwords': [], 'temperature': 1.0, 'top_n': 100, 'use_proba': False}, 'similarity': {'sample_proba': 0.5, 'temperature': 1.0, 'top_n': 100, 'use_proba': False}, 'unknown': {'sample_proba': 0.5}}
SAMPLING_LANGUAGE_MODEL = 'language_model'

Language model sampling strategy.

SAMPLING_SIMILARITY = 'similarity'

Similarity sampling strategy.

SAMPLING_UNKNOWN = 'unknown'

Unknown sampling strategy.

__init__(predictor, sampling_strategy='unknown', nlp=None, language_model=None, seed=0, **kwargs)[source]

Initialize anchor text explainer.

Parameters
• predictor (Callable[[List[str]], ndarray]) – A callable that takes a list of text strings representing N data points as inputs and returns N outputs.

• sampling_strategy (str) –

Perturbation distribution method:

• 'unknown' - replaces words with UNKs.

• 'similarity' - samples according to a similarity score with the corpus embeddings.

• 'language_model' - samples according the language model’s output distributions.

• nlp (Optional[Language]) – spaCy object when sampling method is 'unknown' or 'similarity'.

• language_model (Optional[LanguageModel]) – Transformers masked language model. This is a model that it adheres to the LanguageModel interface we define in alibi.utils.lang_model.LanguageModel.

• seed (int) – If set, ensure identical random streams.

• kwargs (Any) –

Sampling arguments can be passed as kwargs depending on the sampling_strategy. Check default arguments defined in:

• alibi.explainers.anchor_text.DEFAULT_SAMPLING_UNKNOWN

• alibi.explainers.anchor_text.DEFAULT_SAMPLING_SIMILARITY

• alibi.explainers.anchor_text.DEFAULT_SAMPLING_LANGUAGE_MODEL

Raises
compare_labels(samples)[source]

Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of features fixed to a given value (aka compute the precision of anchors).

Parameters

samples (ndarray) – Samples whose labels are to be compared with the instance label.

Return type

ndarray

Returns

A numpy boolean array indicating whether the prediction was the same as the instance label.

explain(text, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=True, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain instance and return anchor with metadata.

Parameters
• text (str) – Text instance to be explained.

• threshold (float) – Minimum anchor precision threshold. The algorithm tries to find an anchor that maximizes the coverage under precision constraint. The precision constraint is formally defined as $$P(prec(A) \ge t) \ge 1 - \delta$$, where $$A$$ is an anchor, $$t$$ is the threshold parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, we are seeking for an anchor having its precision greater or equal than the given threshold with a confidence of (1 - delta). A higher value guarantees that the anchors are faithful to the model, but also leads to more computation time. Note that there are cases in which the precision constraint cannot be satisfied due to the quantile-based discretisation of the numerical features. If that is the case, the best (i.e. highest coverage) non-eligible anchor is returned.

• delta (float) – Significance threshold. 1 - delta represents the confidence threshold for the anchor precision (see threshold) and the selection of the best anchor candidate in each iteration (see tau).

• tau (float) – Multi-armed bandit parameter used to select candidate anchors in each iteration. The multi-armed bandit algorithm tries to find within a tolerance tau the most promising (i.e. according to the precision) beam_size candidate anchor(s) from a list of proposed anchors. Formally, when the beam_size=1, the multi-armed bandit algorithm seeks to find an anchor $$A$$ such that $$P(prec(A) \ge prec(A^\star) - \tau) \ge 1 - \delta$$, where $$A^\star$$ is the anchor with the highest true precision (which we don’t know), $$\tau$$ is the tau parameter, $$\delta$$ is the delta parameter, and $$prec(\cdot)$$ denotes the precision of an anchor. In other words, in each iteration, the algorithm returns with a probability of at least 1 - delta an anchor $$A$$ with a precision within an error tolerance of tau from the precision of the highest true precision anchor $$A^\star$$. A bigger value for tau means faster convergence but also looser anchor conditions.

• batch_size (int) – Batch size used for sampling. The Anchor algorithm will query the black-box model in batches of size batch_size. A larger batch_size gives more confidence in the anchor, again at the expense of computation time since it involves more model prediction calls.

• coverage_samples (int) – Number of samples used to estimate coverage from during anchor search.

• beam_size (int) – Number of candidate anchors selected by the multi-armed bandit algorithm in each iteration from a list of proposed anchors. A bigger beam width can lead to a better overall anchor (i.e. prevents the algorithm of getting stuck in a local maximum) at the expense of more computation time.

• stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

• max_anchor_size (Optional[int]) – Maximum number of features to include in an anchor.

• min_samples_start (int) – Number of samples used for anchor search initialisation.

• n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with predicted label are stored).

• binary_cache_size (int) – The anchor search pre-allocates binary_cache_size batches for storing the boolean arrays returned during sampling.

• cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

• verbose (bool) – Display updates during the anchor search iterations.

• verbose_every (int) – Frequency of displayed iterations during anchor search process.

• **kwargs (Any) – Other keyword arguments passed to the anchor beam search and the text sampling and perturbation functions.

Return type

Explanation

Returns

Explanation object containing the anchor explaining the instance with additional metadata as attributes. Contains the following data-related attributes –

• anchor : List[str] - a list of words in the proposed anchor.

• precision : float - the fraction of times the sampled instances where the anchor holds yields the same prediction as the original instance. The precision will always be threshold for a valid anchor.

• coverage : float - the fraction of sampled instances the anchor applies to.

model: Union[spacy.language.Language, LanguageModel]

Language model to be used.

perturbation: Any

Perturbation method.

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New predictor function.

Return type

None

sampler(anchor, num_samples, compute_labels=True)[source]

Generate perturbed samples while maintaining features in positions specified in anchor unchanged.

Parameters
• anchor (Tuple[int, tuple]) –

• int - the position of the anchor in the input batch.

• tuple - the anchor itself, a list of words to be kept unchanged.

• num_samples (int) – Number of generated perturbed samples.

• compute_labels (bool) – If True, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.

Return type

Union[List[Union[ndarray, float, int]], List[ndarray]]

Returns

• If compute_labels=True, a list containing the following is returned –

• covered_true - perturbed examples where the anchor applies and the model prediction on perturbation is the same as the instance prediction.

• covered_false - perturbed examples where the anchor applies and the model prediction is NOT the same as the instance prediction.

• labels - num_samples ints indicating whether the prediction on the perturbed sample matches (1) the label of the instance to be explained or not (0).

• data - Matrix with 1s and 0s indicating whether a word in the text has been perturbed for each sample.

• -1.0 - indicates exact coverage is not computed for this algorithm.

• anchor[0] - position of anchor in the batch request.

• Otherwise, a list containing the data matrix only is returned.

class alibi.explainers.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Bases: Explainer, FitMixin

__init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623

Parameters
• predict (Union[Callable[[ndarray], ndarray], Model]) – tensorflow model or any other model’s prediction function returning class probabilities.

• mode (str) – Find pertinent negatives (PN) or pertinent positives (PP).

• shape (tuple) – Shape of input data starting with batch size.

• kappa (float) – Confidence parameter for the attack loss term.

• beta (float) – Regularization constant for L1 loss term.

• feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be float or numpy arrays with dimension (1x nb of features) for feature-wise ranges.

• gamma (float) – Regularization constant for optional auto-encoder loss term.

• ae_model (Optional[Model]) – Optional auto-encoder model used for loss regularization.

• learning_rate_init (float) – Initial learning rate of optimizer.

• max_iterations (int) – Maximum number of iterations for finding a PN or PP.

• c_init (float) – Initial value to scale the attack loss term.

• c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term.

• eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features).

• clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the tensorflow graph.

• no_info_val (Union[float, ndarray, None]) – Global or feature-wise value considered as containing no information.

• write_dir (Optional[str]) – Directory to write tensorboard files to.

• sess (Optional[Session]) – Optional tensorflow session that will be used if passed instead of creating or inferring one internally.

attack(X, Y, verbose=False)[source]

Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
• X (ndarray) – Instance to attack.

• Y (ndarray) – Labels for X.

• verbose (bool) – Print intermediate results of optimization if True.

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, verbose=False)[source]

Explain instance and return PP or PN with metadata.

Parameters
• X (ndarray) – Instances to attack.

• Y (Optional[ndarray]) – Labels for X.

• verbose (bool) – Print intermediate results of optimization if True.

Return type

Explanation

Returns

explanationExplanation object containing the PP or PN with additional metadata as attributes. See usage at CEM examples for details.

fit(train_data, no_info_type='median')[source]

Get ‘no information’ values from the training data.

Parameters
• train_data (ndarray) – Representative sample from the training data.

• no_info_type (str) – Median or mean value by feature supported.

Return type

CEM

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
• X (ndarray) – Instance around which gradient is evaluated.

• Y (ndarray) – One-hot representation of instance labels.

Return type

ndarray

Returns

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
• pred_proba (ndarray) – Prediction probabilities of an instance.

• Y (ndarray) – One-hot representation of instance labels.

Return type

ndarray

Returns

Loss of the attack.

meta: dict

perturb(X, eps, proba=False)[source]

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters
• X (ndarray) – Array to be perturbed.

• eps (Union[float, ndarray]) – Size of perturbation.

• proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1.

Return type

Tuple[ndarray, ndarray]

Returns

Instances where a positive and negative perturbation is applied.

reset_predictor(predictor)[source]

Resets the predictor function/model.

Parameters

predictor (Union[Callable, Model]) – New predictor function/model.

Return type

None

class alibi.explainers.Counterfactual(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(-10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]

Bases: Explainer

__init__(predict_fn, shape, distance_fn='l1', target_proba=1.0, target_class='other', max_iter=1000, early_stop=50, lam_init=0.1, max_lam_steps=10, tol=0.05, learning_rate_init=0.1, feature_range=(-10000000000.0, 10000000000.0), eps=0.01, init='identity', decay=True, write_dir=None, debug=False, sess=None)[source]

Initialize counterfactual explanation method based on Wachter et al. (2017)

Parameters
• predict_fn (Union[Callable[[ndarray], ndarray], Model]) – tensorflow model or any other model’s prediction function returning class probabilities.

• shape (Tuple[int, ...]) – Shape of input data starting with batch size.

• distance_fn (str) – Distance function to use in the loss term.

• target_proba (float) – Target probability for the counterfactual to reach.

• target_class (Union[str, int]) – Target class for the counterfactual to reach, one of 'other', 'same' or an integer denoting desired class membership for the counterfactual instance.

• max_iter (int) – Maximum number of iterations to run the gradient descent for (inner loop).

• early_stop (int) – Number of steps after which to terminate gradient descent if all or none of found instances are solutions.

• lam_init (float) – Initial regularization constant for the prediction part of the Wachter loss.

• max_lam_steps (int) – Maximum number of times to adjust the regularization constant (outer loop) before terminating the search.

• tol (float) – Tolerance for the counterfactual target probability.

• learning_rate_init – Initial learning rate for each outer loop of lambda.

• feature_range (Union[Tuple, str]) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be float or numpy arrays with dimension (1 x nb of features) for feature-wise ranges.

• eps (Union[float, ndarray]) – Gradient step sizes used in calculating numerical gradients, defaults to a single value for all features, but can be passed an array for feature-wise step sizes.

• init (str) – Initialization method for the search of counterfactuals, currently must be 'identity'.

• decay (bool) – Flag to decay learning rate to zero for each outer loop over lambda.

• write_dir (Optional[str]) – Directory to write tensorboard files to.

• debug (bool) – Flag to write tensorboard summaries for debugging.

• sess (Optional[Session]) – Optional tensorflow session that will be used if passed instead of creating or inferring one internally.

explain(X)[source]

Explain an instance and return the counterfactual with metadata.

Parameters

X (ndarray) – Instance to be explained.

Return type

Explanation

Returns

explanationExplanation object containing the counterfactual with additional metadata as attributes. See usage at Counterfactual examples for details.

fit(X, y)[source]

Fit method - currently unused as the counterfactual search is fully unsupervised.

Parameters
• X (ndarray) – Not used. Included for consistency.

• y (Optional[ndarray]) – Not used. Included for consistency.

Return type

Counterfactual

Returns

self – Explainer itself.

meta: dict

reset_predictor(predictor)[source]

Resets the predictor function/model.

Parameters

predictor (Union[Callable, Model]) – New predictor function/model.

Return type

None

class alibi.explainers.CounterfactualProto(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Bases: Explainer, FitMixin

__init__(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Initialize prototypical counterfactual method.

Parameters
• predict (Union[Callable[[ndarray], ndarray], Model]) – tensorflow model or any other model’s prediction function returning class probabilities.

• shape (tuple) – Shape of input data starting with batch size.

• kappa (float) – Confidence parameter for the attack loss term.

• beta (float) – Regularization constant for L1 loss term.

• feature_range (Tuple[Union[float, ndarray], Union[float, ndarray]]) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be float or numpy arrays with dimension (1x nb of features) for feature-wise ranges.

• gamma (float) – Regularization constant for optional auto-encoder loss term.

• ae_model (Optional[Model]) – Optional auto-encoder model used for loss regularization.

• enc_model (Optional[Model]) – Optional encoder model used to guide instance perturbations towards a class prototype.

• theta (float) – Constant for the prototype search loss term.

• cat_vars (Optional[Dict[int, int]]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

• ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

• use_kdtree (bool) – Whether to use k-d trees for the prototype loss term if no encoder is available.

• learning_rate_init (float) – Initial learning rate of optimizer.

• max_iterations (int) – Maximum number of iterations for finding a counterfactual.

• c_init (float) – Initial value to scale the attack loss term.

• c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term.

• eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features).

• clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the tensorflow graph.

• write_dir (Optional[str]) – Directory to write tensorboard files to.

• sess (Optional[Session]) – Optional tensorflow session that will be used if passed instead of creating or inferring one internally.

attack(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Find a counterfactual (CF) for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
• X (ndarray) – Instance to attack.

• Y (ndarray) – Labels for X as one-hot-encoding.

• target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

• k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

• k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type='mean') or the k-nearest encoding in the class (k_type='point') to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

• threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

• verbose (bool) – Print intermediate results of optimization if True.

• print_every (int) – Print frequency if verbose is True.

• log_every (int) – tensorboard log frequency if write directory is specified.

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Explain instance and return counterfactual with metadata.

Parameters
• X (ndarray) – Instances to attack.

• Y (Optional[ndarray]) – Labels for X as one-hot-encoding.

• target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

• k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

• k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type='mean') or the k-nearest encoding in the class (k_type='point') to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

• threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

• verbose (bool) – Print intermediate results of optimization if True.

• print_every (int) – Print frequency if verbose is True.

• log_every (int) – tensorboard log frequency if write directory is specified

Return type

Explanation

Returns

explanationExplanation object containing the counterfactual with additional metadata as attributes. See usage at CFProto examples for details.

fit(train_data, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=(25, 50, 75), standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)[source]

Get prototypes for each class using the encoder or k-d trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.

Parameters
• train_data (ndarray) – Representative sample from the training data.

• trustscore_kwargs (Optional[dict]) – Optional arguments to initialize the trust scores method.

• d_type (str) – Pairwise distance metric used for categorical variables. Currently, 'abdm', 'mvdm' and 'abdm-mvdm' are supported. 'abdm' infers context from the other variables while 'mvdm' uses the model predictions. 'abdm-mvdm' is a weighted combination of the two metrics.

• w (Optional[float]) – Weight on 'abdm' (between 0. and 1.) distance if d_type equals 'abdm-mvdm'.

• disc_perc (Sequence[Union[int, float]]) – List with percentiles used in binning of numerical features used for the 'abdm' and 'abdm-mvdm' pairwise distance measures.

• standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

• smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values will smooth the difference in distance metric between different features.

• center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

• update_feature_range (bool) – Update feature range with scaled values.

Return type

CounterfactualProto

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s.

Parameters
• X (ndarray) – Instance around which gradient is evaluated.

• Y (ndarray) – One-hot representation of instance labels.

• cat_vars_ord (dict) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

Return type

ndarray

Returns

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
• pred_proba (ndarray) – Prediction probabilities of an instance.

• Y (ndarray) – One-hot representation of instance labels.

Return type

ndarray

Returns

Loss of the attack.

meta: dict

reset_predictor(predictor)[source]

Resets the predictor function/model.

Parameters

predictor (Union[Callable, Model]) – New predictor function/model.

Return type

None

Parameters
• X (ndarray) – Instance to encode and calculate distance metrics for.

• adv_class (int) – Predicted class on the perturbed instance.

• orig_class (int) – Predicted class on the original instance.

• eps (float) – Small number to avoid dividing by 0.

Return type

float

Returns

Ratio between the distance to the prototype of the predicted class for the original instance and the prototype of the predicted class for the perturbed instance.

class alibi.explainers.CounterfactualRL(predictor, encoder, decoder, coeff_sparsity, coeff_consistency, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]

Bases: Explainer, FitMixin

Counterfactual Reinforcement Learning.

__init__(predictor, encoder, decoder, coeff_sparsity, coeff_consistency, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]

Constructor.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes a numpy array of N data points as inputs and returns N outputs. For classification task, the second dimension of the output should match the number of classes. Thus, the output can be either a soft label distribution or a hard label distribution (i.e. one-hot encoding) without affecting the performance since argmax is applied to the predictor’s output.

• encoder (Union[Model, Module]) – Pretrained encoder network.

• decoder (Union[Model, Module]) – Pretrained decoder network.

• coeff_sparsity (float) – Sparsity loss coefficient.

• coeff_consistency (float) – Consistency loss coefficient.

• latent_dim (Optional[int]) – Auto-encoder latent dimension. Can be omitted if the actor network is user specified.

• backend (str) – Deep learning backend: 'tensorflow' | 'pytorch'. Default 'tensorflow'.

• seed (int) – Seed for reproducibility. The results are not reproducible for 'tensorflow' backend.

• **kwargs – Used to replace any default parameter from alibi.explainers.cfrl_base.DEFAULT_BASE_PARAMS.

explain(X, Y_t, C=None, batch_size=100)[source]

Explains an input instance

Parameters
• X (ndarray) – Instances to be explained.

• Y_t (ndarray) – Counterfactual targets.

• C (Optional[ndarray]) – Conditional vectors. If None, it means that no conditioning was used during training (i.e. the conditional_func returns None).

• batch_size (int) – Batch size to be used when generating counterfactuals.

Return type

Explanation

Returns

explanationExplanation object containing the counterfactual with additional metadata as attributes. See usage at CFRL examples for details.

fit(X)[source]

Fit the model agnostic counterfactual generator.

Parameters

X (ndarray) – Training data array.

Return type

Explainer

Returns

self – The explainer itself.

Parameters
• path (Union[str, PathLike]) – Path to a directory containing the saved explainer.

• predictor (Any) – Model or prediction function used to originally initialize the explainer.

Return type

Explainer

Returns

An explainer instance.

meta: dict

reset_predictor(predictor)[source]

Resets the predictor.

Parameters

predictor (Any) – New predictor.

Return type

None

save(path)[source]

Save an explainer to disk. Uses the dill module.

Parameters

path (Union[str, PathLike]) – Path to a directory. A new directory will be created if one does not exist.

Return type

None

class alibi.explainers.CounterfactualRLTabular(predictor, encoder, decoder, encoder_preprocessor, decoder_inv_preprocessor, coeff_sparsity, coeff_consistency, feature_names, category_map, immutable_features=None, ranges=None, weight_num=1.0, weight_cat=1.0, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]

Bases: CounterfactualRL

Counterfactual Reinforcement Learning Tabular.

__init__(predictor, encoder, decoder, encoder_preprocessor, decoder_inv_preprocessor, coeff_sparsity, coeff_consistency, feature_names, category_map, immutable_features=None, ranges=None, weight_num=1.0, weight_cat=1.0, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]

Constructor.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes a numpy array of N data points as inputs and returns N outputs. For classification task, the second dimension of the output should match the number of classes. Thus, the output can be either a soft label distribution or a hard label distribution (i.e. one-hot encoding) without affecting the performance since argmax is applied to the predictor’s output.

• encoder (Union[Model, Module]) – Pretrained heterogeneous encoder network.

• decoder (Union[Model, Module]) – Pretrained heterogeneous decoder network. The output of the decoder must be a list of tensors.

• encoder_preprocessor (Callable) – Auto-encoder data pre-processor. Depending on the input format, the pre-processor can normalize numerical attributes, transform label encoding to one-hot encoding etc.

• decoder_inv_preprocessor (Callable) – Auto-encoder data inverse pre-processor. This is the inverse function of the pre-processor. It can denormalize numerical attributes, transform one-hot encoding to label encoding, feature type casting etc.

• coeff_sparsity (float) – Sparsity loss coefficient.

• coeff_consistency (float) – Consistency loss coefficient.

• feature_names (List[str]) – List of feature names. This should be provided by the dataset.

• category_map (Dict[int, List[str]]) – Dictionary of category mapping. The keys are column indexes and the values are lists containing the possible values for a feature. This should be provided by the dataset.

• immutable_features (Optional[List[str]]) – List of immutable features.

• ranges (Optional[Dict[str, Tuple[int, int]]]) – Numerical feature ranges. Note that exist numerical features such as 'Age', which are allowed to increase only. We denote those by 'inc_feat'. Similarly, there exist features allowed to decrease only. We denote them by 'dec_feat'. Finally, there are some free feature, which we denote by 'free_feat'. With the previous notation, we can define range = {'inc_feat': [0, 1], 'dec_feat': [-1, 0], 'free_feat': [-1, 1]}. 'free_feat' can be omitted, as any unspecified feature is considered free. Having the ranges of a feature {‘feat’: [a_low, a_high}, when sampling is performed the numerical value will be clipped between [a_low * (max_val - min_val), a_high * [max_val - min_val]], where a_low and a_high are the minimum and maximum values the feature 'feat'. This implies that a_low and a_high are not restricted to {-1, 0} and {0, 1}, but can be any float number in-between [-1, 0] and [0, 1].

• weight_num (float) – Numerical loss weight.

• weight_cat (float) – Categorical loss weight.

• latent_dim (Optional[int]) – Auto-encoder latent dimension. Can be omitted if the actor network is user specified.

• backend (str) – Deep learning backend: 'tensorflow' | 'pytorch'. Default 'tensorflow'.

• seed (int) – Seed for reproducibility. The results are not reproducible for 'tensorflow' backend.

• **kwargs – Used to replace any default parameter from alibi.explainers.cfrl_base.DEFAULT_BASE_PARAMS.

explain(X, Y_t, C=None, batch_size=100, diversity=False, num_samples=1, patience=1000, tolerance=0.001)[source]

Computes counterfactuals for the given instances conditioned on the target and the conditional vector.

Parameters
• X (ndarray) – Input instances to generate counterfactuals for.

• Y_t (ndarray) – Target labels.

• C (Optional[List[Dict[str, List[Union[float, str]]]]]) – List of conditional dictionaries. If None, it means that no conditioning was used during training (i.e. the conditional_func returns None). If conditioning was used during training but no conditioning is desired for the current input, an empty list is expected.

• diversity (bool) – Whether to generate diverse counterfactual set for the given instance. Only supported for a single input instance.

• num_samples (int) – Number of diversity samples to be generated. Considered only if diversity=True.

• batch_size (int) – Batch size to use when generating counterfactuals.

• patience (int) – Maximum number of iterations to perform diversity search stops. If -1, the search stops only if the desired number of samples has been found.

• tolerance (float) – Tolerance to distinguish two counterfactual instances.

Return type

Explanation

Returns

explanationExplanation object containing the counterfactual with additional metadata as attributes. See usage CFRL examples for details.

fit(X)[source]

Fit the model agnostic counterfactual generator.

Parameters

X (ndarray) – Training data array.

Return type

Explainer

Returns

self – The explainer itself.

meta: dict

class alibi.explainers.DistributedAnchorTabular(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]

Bases: AnchorTabular

explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=1, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explains the prediction made by a classifier on instance X. Sampling is done in parallel over a number of cores specified in kwargs[‘ncpu’].

Parameters
Return type

Explanation

Returns
fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Creates a list of handles to parallel processes handles that are used for submitting sampling tasks.

Parameters
Return type

AnchorTabular

instance_label: int

The label of the instance to be explained.

meta: dict

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New model prediction function.

Return type

None

samplers: list

Bases: BaseSimilarityExplainer

The gradient similarity explainer is used to find examples in the training data that the predictor considers similar to test instances the user wants to explain. It uses the gradients of the loss between the model output and the training data labels. These are compared using the similarity function specified by sim_fn. The GradientSimilarity explainer can be applied to models trained for both classification and regression tasks.

Parameters
• predictor (Union[Model, Module]) – Model to explain.

• loss_fn (Union[Callable[[Tensor, Tensor], Tensor], Callable[[Tensor, Tensor], Tensor]]) – Loss function used. The gradient of the loss function is used to compute the similarity between the test instances and the training set.

• task (Literal[‘classification’, ‘regression’]) – Type of task performed by the model. If the task is 'classification', the target value passed to the explain method of the test instance can be specified either directly or left as None, if left None we use the model’s maximum prediction. If the task is 'regression', the target value of the test instance must be specified directly.

• precompute_grads (bool) – Whether to precompute the gradients. If False, gradients are computed on the fly otherwise we precompute them which can be faster when it comes to computing explanations. Note this option may be memory intensive if the model is large.

• backend (Literal[‘tensorflow’, ‘pytorch’]) – Backend to use.

• device (Union[int, str, device, None]) – Device to use. If None, the default device for the backend is used. If using pytorch backend see pytorch device docs for correct options. Note that in the pytorch backend case this parameter can be a torch.device. If using tensorflow backend see tensorflow docs for correct options.

• verbose (bool) – Whether to print the progress of the explainer.

Raises
• ValueError – If the task is not 'classification' or 'regression'.

• ValueError – If the backend is not 'tensorflow' or 'pytorch'.

• TypeError – If the device is not an int, str, torch.device or None for the torch backend option or if the device is not str or None for the tensorflow backend option.

explain(X, Y=None)[source]

Explain the predictor’s predictions for a given input.

Computes the similarity score between the inputs and the training set. Returns an explainer object containing the scores, the indices of the training set instances sorted by descending similarity and the most similar and least similar instances of the data set for the input. Note that the input may be a single instance or a batch of instances.

Parameters
• X (Union[ndarray, Tensor, Tensor]) – X can be a numpy array, tensorflow tensor, or pytorch tensor of the same shape as the training data with or without a leading batch dimension. If the batch dimension is missing it’s added.

• Y (Union[ndarray, Tensor, Tensor, None]) – Y can be a numpy array, tensorflow tensor or a pytorch tensor. In the case of a regression task, the Y argument must be present. If the task is classification then Y defaults to the model prediction.

Return type

Explanation

Returns

Explanation object containing the ordered similarity scores for the test instance(s) with additional metadata as attributes. Contains the following data-related attributes –

• scores: np.ndarray - similarity scores for each pair of instances in the training and test set sorted in descending order.

• ordered_indices: np.ndarray - indices of the paired training and test set instances sorted by the similarity score in descending order.

• most_similar: np.ndarray - 5 most similar instances in the training set for each test instance The first element is the most similar instance.

• least_similar: np.ndarray - 5 least similar instances in the training set for each test instance. The first element is the least similar instance.

Raises
• ValueError – If Y is None and the task is 'regression'.

• ValueError – If the shape of X or Y does not match the shape of the training or target data.

• ValueError – If the fit method has not been called prior to calling this method.

fit(X_train, Y_train)[source]

Fit the explainer.

The GradientSimilarity explainer requires the model gradients over the training data. In the explain method it compares them to the model gradients for the test instance(s). If precompute_grads=True on initialization then the gradients are precomputed here and stored. This will speed up the explain method call but storing the gradients may not be feasible for large models.

Parameters
• X_train (ndarray) – Training data.

• Y_train (ndarray) – Training labels.

Return type

Explainer

Returns

self – Returns self.

meta: dict

class alibi.explainers.IntegratedGradients(model, layer=None, target_fn=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]

Bases: Explainer

__init__(model, layer=None, target_fn=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]

An implementation of the integrated gradients method for tensorflow models.

For details of the method see the original paper: https://arxiv.org/abs/1703.01365 .

Parameters
• model (Model) – tensorflow model.

• layer (Optional[Layer]) – Layer with respect to which the gradients are calculated. If not provided, the gradients are calculated with respect to the input.

• target_fn (Optional[Callable]) – A scalar function that is applied to the predictions of the model. This can be used to specify which scalar output the attributions should be calculated for. This can be particularly useful if the desired output is not known before calling the model (e.g. explaining the argmax output for a probabilistic classifier, in this case we could pass target_fn=partial(np.argmax, axis=1)).

• method (str) – Method for the integral approximation. Methods available: "riemann_left", "riemann_right", "riemann_middle", "riemann_trapezoid", "gausslegendre".

• n_steps (int) – Number of step in the path integral approximation from the baseline to the input instance.

• internal_batch_size (int) – Batch size for the internal batching.

explain(X, forward_kwargs=None, baselines=None, target=None, attribute_to_layer_inputs=False)[source]

Calculates the attributions for each input feature or element of layer and returns an Explanation object.

Parameters
• X (Union[ndarray, List[ndarray]]) – Instance for which integrated gradients attribution are computed.

• forward_kwargs (Optional[dict]) – Input keyword args. If it’s not None, it must be a dict with numpy arrays as values. The first dimension of the arrays must correspond to the number of examples. It will be repeated for each of n_steps along the integrated path. The attributions are not computed with respect to these arguments.

• baselines (Union[int, float, ndarray, List[int], List[float], List[ndarray], None]) – Baselines (starting point of the path integral) for each instance. If the passed value is an np.ndarray must have the same shape as X. If not provided, all features values for the baselines are set to 0.

• target (Union[int, list, ndarray, None]) – Defines which element of the model output is considered to compute the gradients. Target can be a numpy array, a list or a numeric value. Numeric values are only valid if the model’s output is a rank-n tensor with n <= 2 (regression and classification models). If a numeric value is passed, the gradients are calculated for the same element of the output for all data points. For regression models whose output is a scalar, target should not be provided. For classification models target can be either the true classes or the classes predicted by the model. It must be provided for classification models and regression models whose output is a vector. If the model’s output is a rank-n tensor with n > 2, the target must be a rank-2 numpy array or a list of lists (a matrix) with dimensions nb_samples X (n-1) .

• attribute_to_layer_inputs (bool) – In case of layers gradients, controls whether the gradients are computed for the layer’s inputs or outputs. If True, gradients are computed for the layer’s inputs, if False for the layer’s outputs.

Return type

Explanation

Returns

explanationExplanation object including meta and data attributes with integrated gradients attributions for each feature. See usage at IG examples for details.

meta: dict

reset_predictor(predictor)[source]

Resets the predictor model.

Parameters

predictor (Model) – New prediction model.

Return type

None

Bases: Explainer, FitMixin

A wrapper around the shap.KernelExplainer class. It extends the current shap library functionality by allowing the user to specify variable groups in order to treat one-hot encoded categorical as one during sampling. The user can also specify whether to aggregate the shap values estimate for the encoded levels of categorical variables as an optional argument to explain, if grouping arguments are not passed to fit.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A callable that takes as an input a samples x features array and outputs a samples x n_outputs model outputs. The n_outputs should represent model output in margin space. If the model outputs probabilities, then the link should be set to 'logit' to ensure correct force plots.

Valid values are 'identity' or 'logit'. A generalized linear model link to connect the feature importance values to the model output. Since the feature importance values, $$\phi$$, sum up to the model output, it often makes sense to connect them to the ouput with a link function where $$link(output - expected\_value) = sum(\phi)$$. Therefore, for a model which outputs probabilities, link='logit' makes the feature effects have log-odds (evidence) units and link='identity' means that the feature effects have probability units. Please see this example for an in-depth discussion about the semantics of explaining the model in the probability or margin space.

• feature_names (Union[List[str], Tuple[str], None]) – Used to infer group names when categorical data is treated by grouping and group_names input to fit is not specified, assuming it has the same length as the groups argument of fit method. It is also used to compute the names field, which appears as a key in each of the values of explanation.data[‘raw’][‘importances’].

• categorical_names (Optional[Dict[int, List[str]]]) – Keys are feature column indices in the background_data matrix (see fit). Each value contains strings with the names of the categories for the feature. Used to select the method for background data summarisation (if specified, subsampling is performed as opposed to k-means clustering). In the future it may be used for visualisation.

• task (str) – Can have values 'classification' and 'regression'. It is only used to set the contents of explanation.data[‘raw’][‘prediction’]

• seed (Optional[int]) – Fixes the random number stream, which influences which subsets are sampled during shap value estimation.

• distributed_opts (Optional[Dict]) – A dictionary that controls the algorithm distributed execution. See alibi.explainers.shap_wrappers.DISTRIBUTED_OPTS documentation for details.

explain(X, summarise_result=False, cat_vars_start_idx=None, cat_vars_enc_dim=None, **kwargs)[source]

Explains the instances in the array X.

Parameters
• X (Union[ndarray, DataFrame, spmatrix]) – Instances to be explained.

• summarise_result (bool) – Specifies whether the shap values corresponding to dimensions of encoded categorical variables should be summed so that a single shap value is returned for each categorical variable. Both the start indices of the categorical variables (cat_vars_start_idx) and the encoding dimensions (cat_vars_enc_dim) have to be specified

• cat_vars_start_idx (Optional[Sequence[int]]) – The start indices of the categorical variables. If specified, cat_vars_enc_dim should also be specified.

• cat_vars_enc_dim (Optional[Sequence[int]]) – The length of the encoding dimension for each categorical variable. If specified cat_vars_start_idx should also be specified.

• **kwargs

Keyword arguments specifying explain behaviour. Valid arguments are:

• nsamples - controls the number of predictor calls and therefore runtime.

• l1_reg - the algorithm is exponential in the feature dimension. If set to auto the algorithm will first run a feature selection algorithm to select the top features, provided the fraction of sampled sets of missing features is less than 0.2 from the number of total subsets. The Akaike Information Criterion is used in this case. See our examples for more details about available settings for this parameter. Note that by first running a feature selection step, the shapley values of the remainder of the features will be different to those estimated from the entire set.

For more details, please see the shap library documentation .

Return type

Explanation

Returns

explanation – An explanation object containing the shap values and prediction in the data field, along with a meta field containing additional data. See usage at KernelSHAP examples for details.

fit(background_data, summarise_background=False, n_background_samples=300, group_names=None, groups=None, weights=None, **kwargs)[source]

This takes a background dataset (usually a subsample of the training set) as an input along with several user specified options and initialises a KernelShap explainer. The runtime of the algorithm depends on the number of samples in this dataset and on the number of features in the dataset. To reduce the size of the dataset, the summarise_background option and n_background_samples should be used. To reduce the feature dimensionality, encoded categorical variables can be treated as one during the feature perturbation process; this decreases the effective feature dimensionality, can reduce the variance of the shap values estimation and reduces slightly the number of calls to the predictor. Further runtime savings can be achieved by changing the nsamples parameter in the call to explain. Runtime reduction comes with an accuracy trade-off, so it is better to experiment with a runtime reduction method and understand results stability before using the system.

Parameters
• background_data (Union[ndarray, spmatrix, DataFrame, Data]) – Data used to estimate feature contributions and baseline values for force plots. The rows of the background data should represent samples and the columns features.

• summarise_background (Union[bool, str]) – A large background dataset impacts the runtime and memory footprint of the algorithm. By setting this argument to True, only n_background_samples from the provided data are selected. If group_names or groups arguments are specified, the algorithm assumes that the data contains categorical variables so the records are selected uniformly at random. Otherwise, shap.kmeans (a wrapper around sklearn k-means implementation) is used for selection. If set to 'auto', a default of KERNEL_SHAP_BACKGROUND_THRESHOLD samples is selected.

• n_background_samples (int) – The number of samples to keep in the background dataset if summarise_background=True.

• groups (Optional[List[Union[Tuple[int], List[int]]]]) – A list containing sub-lists specifying the indices of features belonging to the same group.

• group_names (Union[List[str], Tuple[str], None]) – If specified, this array is used to treat groups of features as one during feature perturbation. This feature can be useful, for example, to treat encoded categorical variables as one and can result in computational savings (this may require adjusting the nsamples parameter).

• weights (Union[List[float], Tuple[float], ndarray, None]) – A sequence or array of weights. This is used only if grouping is specified and assigns a weight to each point in the dataset.

• **kwargs – Expected keyword arguments include keep_index (bool) and should be used if a data frame containing an index column is passed to the algorithm.

Return type

KernelShap

reset_predictor(predictor)[source]

Resets the prediction function.

Parameters

predictor (Callable) – New prediction function.

Return type

None

class alibi.explainers.PartialDependence(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: PartialDependenceBase

Black-box implementation of partial dependence for tabular datasets. Supports multiple feature interactions.

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Initialize black-box model implementation of partial dependence.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A prediction function which receives as input a numpy array of size N x F and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features and T is the number of targets.

• feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

• categorical_names (Optional[Dict[int, List[str]]]) –

Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}

• target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

• verbose (bool) – Whether to print the progress of the explainer.

Notes

The length of the target_names should match the number of columns returned by a call to the predictor. For example, in the case of a binary classifier, if the predictor outputs a decision score (i.e. uses the decision_function method) which returns one column, then the length of the target_names should be one. On the other hand, if the predictor outputs a prediction probability (i.e. uses the predict_proba method) which returns two columns (one for the negative class and one for the positive class), then the length of the target_names should be two.

explain(X, features=None, kind='average', percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the partial dependence for each feature and/or tuples of features with respect to the all targets and the reference dataset X.

Parameters
• X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

• features (Optional[List[Union[int, Tuple[int, int]]]]) – An optional list of features or tuples of features for which to calculate the partial dependence. If not provided, the partial dependence will be computed for every single features in the dataset. Some example for features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

• kind (Literal[‘average’, ‘individual’, ‘both’]) – If set to 'average', then only the partial dependence (PD) averaged across all samples from the dataset is returned. If set to 'individual', then only the individual conditional expectation (ICE) is returned for each data point from the dataset. Otherwise, if set to 'both', then both the PD and the ICE are returned.

• percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

• grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

• grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type

Explanation

Returns

explanation – An Explanation object containing the data and the metadata of the calculated partial dependence curves. See usage at Partial dependence examples for details

meta: dict

class alibi.explainers.PartialDependenceVariance(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: Explainer

Implementation of the partial dependence(PD) variance feature importance and feature interaction for tabular datasets. The method measure the importance feature importance as the variance within the PD function. Similar, the potential feature interaction is measured by computing the variance within the two-way PD function by holding one variable constant and letting the other vary. Supports black-box models and the following sklearn tree-based models: GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, HistGradientBoostingRegressor, DecisionTreeRegressor, RandomForestRegressor.

For details of the method see the original paper: https://arxiv.org/abs/1805.04755 .

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Initialize black-box/tree-based model implementation for the partial dependence variance feature importance.

Parameters
• predictor (Union[BaseEstimator, Callable[[ndarray], ndarray]]) – A sklearn estimator or a prediction function which receives as input a numpy array of size N x F and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features and T is the number of targets.

• feature_names (Optional[List[str]]) – A list of feature names used for displaying results.E

• categorical_names (Optional[Dict[int, List[str]]]) –

Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}

• target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

• verbose (bool) – Whether to print the progress of the explainer.

Notes

The length of the target_names should match the number of columns returned by a call to the predictor. For example, in the case of a binary classifier, if the predictor outputs a decision score (i.e. uses the decision_function method) which returns one column, then the length of the target_names should be one. On the other hand, if the predictor outputs a prediction probability (i.e. uses the predict_proba method) which returns two columns (one for the negative class and one for the positive class), then the length of the target_names should be two.

explain(X, features=None, method='importance', percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the variance partial dependence feature importance for each feature with respect to the all targets and the reference dataset X.

Parameters
• X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

• features (Union[List[int], List[Tuple[int, int]], None]) – A list of features for which to compute the feature importance or a list of feature pairs for which to compute the feature interaction. Some example of features would be: [0, 1, 3], [(0, 1), (0, 3), (1, 3)], where 0,1`, and 3 correspond to the columns 0, 1, and 3 in X. If not provided, the feature importance or the feature interaction will be computed for every feature or for every combination of feature pairs, depending on the parameter method.

• method (Literal[‘importance’, ‘interaction’]) – Flag to specify whether to compute the feature importance or the feature interaction of the elements provided in features. Supported values: 'importance' | 'interaction'.

• percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

• grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

• grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type

Explanation

Returns

explanation – An Explanation object containing the data and the metadata of the calculated partial dependence curves and feature importance/interaction. See usage at Partial dependence variance examples for details

meta: dict

class alibi.explainers.PermutationImportance(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]

Bases: Explainer

Implementation of the permutation feature importance for tabular datasets. The method measure the importance of a feature as the relative increase/decrease in the loss/score function when the feature values are permuted. Supports black-box models.

For details of the method see the papers:

__init__(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]

Initialize the permutation feature importance.

Parameters
• predictor (Callable[[ndarray], ndarray]) – A prediction function which receives as input a numpy array of size N x F, and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features, and T is the number of targets. Note that the output shape must be compatible with the loss and score functions provided in loss_fns and score_fns.

• loss_fns (Union[Literal[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’], List[Literal[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’]], Callable[[ndarray, ndarray, Optional[ndarray]], float], Dict[str, Callable[[ndarray, ndarray, Optional[ndarray]], float]], None]) –

A literal, or a list of literals, or a loss function, or a dictionary of loss functions having as keys the names of the loss functions and as values the loss functions (i.e., lower values are better). The available literal values are described in alibi.explainers.permutation_importance.LOSS_FNS. Note that the predictor output must be compatible with every loss function. Every loss function is expected to receive the following arguments:

• y_true : np.ndarray - a numpy array of ground-truth labels.

• y_pred | y_score : np.ndarray - a numpy array of model predictions. This corresponds to the output of the model.

• sample_weight: Optional[np.ndarray] - a numpy array of sample weights.

• score_fns (Union[Literal[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’], List[Literal[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’]], Callable[[ndarray, ndarray, Optional[ndarray]], float], Dict[str, Callable[[ndarray, ndarray, Optional[ndarray]], float]], None]) – A literal, or a list or literals, or a score function, or a dictionary of score functions having as keys the names of the score functions and as values the score functions (i.e, higher values are better). The available literal values are described in alibi.explainers.permutation_importance.SCORE_FNS. As with the loss_fns, the predictor output must be compatible with every score function and the score function must have the same signature presented in the loss_fns parameter description.

• feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

• verbose (bool) – Whether to print the progress of the explainer.

explain(X, y, features=None, method='estimate', kind='ratio', n_repeats=50, sample_weight=None)[source]

Computes the permutation feature importance for each feature with respect to the given loss or score functions and the dataset (X, y).

Parameters
• X (ndarray) – A N x F input feature dataset used to calculate the permutation feature importance. This is typically the test dataset.

• y (ndarray) – Ground-truth labels array of size N (i.e. (N, )) corresponding the input feature X.

• features (Optional[List[Union[int, Tuple[int, ...]]]]) – An optional list of features or tuples of features for which to compute the permutation feature importance. If not provided, the permutation feature importance will be computed for every single features in the dataset. Some example of features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

• method (Literal[‘estimate’, ‘exact’]) – The method to be used to compute the feature importance. If set to 'exact', a “switch” operation is performed across all observed pairs, by excluding pairings that are actually observed in the original dataset. This operation is quadratic in the number of samples (N x (N - 1) samples) and thus can be computationally intensive. If set to 'estimate', the dataset will be divided in half. The values of the first half containing the ground-truth labels the rest of the features (i.e. features that are left intact) is matched with the values of the second half of the permuted features, and the other way around. This method is computationally lighter and provides estimate error bars given by the standard deviation. Note that for some specific loss and score functions, the estimate does not converge to the exact metric value.

• kind (Literal[‘ratio’, ‘difference’]) – Whether to report the importance as the loss/score ratio or the loss/score difference. Available values are: 'ratio' | 'difference'.

• n_repeats (int) – Number of times to permute the feature values. Considered only when method='estimate'.

• sample_weight (Optional[ndarray]) – Optional weight for each sample instance.

Return type

Explanation

Returns

explanation – An Explanation object containing the data and the metadata of the permutation feature importance. See usage at Permutation feature importance examples for details

meta: dict

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters

predictor (Callable) – New predictor function.

Return type

None

class alibi.explainers.TreePartialDependence(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: PartialDependenceBase

Tree-based model sklearn implementation of the partial dependence for tabular datasets. Supports multiple feature interactions. This method is faster than the general black-box implementation but is only supported by some tree-based estimators. The computation is based on a weighted tree traversal. For more details on the computation, check the sklearn documentation page. The supported sklearn models are: GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, HistGradientBoostingRegressor, DecisionTreeRegressor, RandomForestRegressor.

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Initialize tree-based model sklearn implementation of partial dependence.

Parameters
• predictor (BaseEstimator) – A tree-based sklearn estimator.

• feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

• categorical_names (Optional[Dict[int, List[str]]]) –

Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}

• target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

• verbose (bool) – Whether to print the progress of the explainer.

Notes

The length of the target_names should match the number of columns returned by a call to the predictor.decision_function. In the case of a binary classifier, the decision score consists of a single column. Thus, the length of the target_names should be one.

explain(X, features=None, percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the partial dependence for each feature and/or tuples of features with respect to the all targets and the reference dataset X.

Parameters
• X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

• features (Optional[List[Union[int, Tuple[int, int]]]]) – An optional list of features or tuples of features for which to calculate the partial dependence. If not provided, the partial dependence will be computed for every single features in the dataset. Some example for features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

• percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

• grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

• grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type

Explanation

meta: dict

class alibi.explainers.TreeShap(predictor, model_output='raw', feature_names=None, categorical_names=None, task='classification', seed=None)[source]

Bases: Explainer, FitMixin

__init__(predictor, model_output='raw', feature_names=None, categorical_names=None, task='classification', seed=None)[source]

A wrapper around the shap.TreeExplainer class. It adds the following functionality:

1. Input summarisation options to allow control over background dataset size and hence runtime

2. Output summarisation for sklearn models with one-hot encoded categorical variables.

Users are strongly encouraged to familiarise themselves with the algorithm by reading the method overview in the documentation.

Parameters
• predictor (Any) – A fitted model to be explained. XGBoost, LightGBM, CatBoost and most tree-based scikit-learn models are supported. In the future, Pyspark could also be supported. Please open an issue if this is a use case for you.

• model_output (str) –

Supported values are: 'raw', 'probability', 'probability_doubled', 'log_loss':

• 'raw' - the raw model of the output, which varies by task, is explained. This option should always be used if the fit is called without arguments. It should also be set to compute shap interaction values. For regression models it is the standard output, for binary classification in XGBoost it is the log odds ratio.

• 'probability' - the probability output is explained. This option should only be used if fit was called with the background_data argument set. The effect of specifying this parameter is that the shap library will use this information to transform the shap values computed in margin space (aka using the raw output) to shap values that sum to the probability output by the model plus the model expected output probability. This requires knowledge of the type of output for predictor which is inferred by the shap library from the model type (e.g., most sklearn models with exception of sklearn.tree.DecisionTreeClassifier, sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.ExtraTreesClassifier output logits) or on the basis of the mapping implemented in the shap.TreeEnsemble constructor. Only trees that output log odds and probabilities are supported currently.

• 'probability_doubled' - used for binary classification problem in situations where the model outputs the logits/probabilities for the positive class but shap values for both outcomes are desired. This option should be used only if fit was called with the background_data argument set. In this case the expected value for the negative class is 1 - expected_value for positive class and the shap values for the negative class are the negative values of the positive class shap values. As before, the explanation happens in the margin space, and the shap values are subsequently adjusted. convert the model output to probabilities. The same considerations as for probability apply for this output type too.

• 'log_loss' - logarithmic loss is explained. This option shoud be used only if fit was called with the background_data argument set and requires specifying labels, y, when calling explain. If the objective is squared error, then the transformation $$(output - y)^2$$ is applied. For binary cross-entropy objective, the transformation $$log(1 + exp(output)) - y * output$$ with $$y \in \{0, 1\}$$. Currently only binary cross-entropy and squared error losses can be explained.

• feature_names (Union[List[str], Tuple[str], None]) – Used to compute the names field, which appears as a key in each of the values of the importances sub-field of the response raw field.

• categorical_names (Optional[Dict[int, List[str]]]) – Keys are feature column indices. Each value contains strings with the names of the categories for the feature. Used to select the method for background data summarisation (if specified, subsampling is performed as opposed to kmeans clustering). In the future it may be used for visualisation.

• task (str) – Can have values 'classification' and 'regression'. It is only used to set the contents of the prediction field in the data[‘raw’] response field.

Notes

Tree SHAP is an additive attribution method so it is best suited to explaining output in margin space (the entire real line). For discussion related to explaining models in output vs probability space, please consult this resource.

explain(X, y=None, interactions=False, approximate=False, check_additivity=True, tree_limit=None, summarise_result=False, cat_vars_start_idx=None, cat_vars_enc_dim=None, **kwargs)[source]

Explains the instances in X. y should be passed if the model loss function is to be explained, which can be useful in order to understand how various features affect model performance over time. This is only possible if the explainer has been fitted with a background dataset and requires setting model_output=’log_loss’.

Parameters
• X (Union[ndarray, DataFrame, Pool]) – Instances to be explained.

• y (Optional[ndarray]) – Labels corresponding to rows of X. Should be passed only if a background dataset was passed to the fit method.

• interactions (bool) – If True, the shap value for every feature of every instance in X is decomposed into X.shape[1] - 1 shap value interactions and one main effect. This is only supported if fit is called with background_dataset=None.

• approximate (bool) –

If True, an approximation to the shap values that does not account for feature order is computed. This was proposed by Ando Sabaas here . Check this resource for more details. This option is currently only supported for xgboost and sklearn models.

• check_additivity (bool) – If True, output correctness is ensured if model_output='raw' has been passed to the constructor.

• tree_limit (Optional[int]) – Explain the output of a subset of the first tree_limit trees in an ensemble model.

• summarise_result (bool) – This should be set to True only when some of the columns in X represent encoded dimensions of a categorical variable and one single shap value per categorical variable is desired. Both cat_vars_start_idx and cat_vars_enc_dim should be specified as detailed below to allow this.

• cat_vars_start_idx (Optional[Sequence[int]]) – The start indices of the categorical variables.

• cat_vars_enc_dim (Optional[Sequence[int]]) – The length of the encoding dimension for each categorical variable.

Return type

Explanation

Returns

explanation – An Explanation object containing the shap values and prediction in the data field, along with a meta field containing additional data. See usage at TreeSHAP examples for details.

fit(background_data=None, summarise_background=False, n_background_samples=1000, **kwargs)[source]

This function instantiates an explainer which can then be use to explain instances using the explain method. If no background dataset is passed, the explainer uses the path-dependent feature perturbation algorithm to explain the values. As such, only the model raw output can be explained and this should be reflected by passing model_output='raw' when instantiating the explainer. If a background dataset is passed, the interventional feature perturbation algorithm is used. Using this algorithm, probability outputs can also be explained. Additionally, if the model_output='log_loss' option is passed to the explainer constructor, then the model loss function can be explained by passing the labels as the y argument to the explain method. A limited number of loss functions are supported, as detailed in the constructor documentation.

Parameters
• background_data (Union[ndarray, DataFrame, None]) – Data used to estimate feature contributions and baseline values for force plots. The rows of the background data should represent samples and the columns features.

• summarise_background (Union[bool, str]) – A large background dataset may impact the runtime and memory footprint of the algorithm. By setting this argument to True, only n_background_samples from the provided data are selected. If the categorical_names argument has been passed to the constructor, subsampling of the data is used. Otherwise, shap.kmeans (a wrapper around sklearn.kmeans implementation) is used for selection. If set to 'auto', a default of TREE_SHAP_BACKGROUND_WARNING_THRESHOLD samples is selected.

• n_background_samples (int) – The number of samples to keep in the background dataset if summarise_background=True.

Return type

TreeShap

meta: dict

reset_predictor(predictor)[source]

Resets the predictor.

Parameters

predictor (Any) – New prediction.

Return type

None

alibi.explainers.plot_ale(exp, features='all', targets='all', n_cols=3, sharey='all', constant=False, ax=None, line_kw=None, fig_kw=None)[source]

Plot ALE curves on matplotlib axes.

Parameters
• exp – An Explanation object produced by a call to the alibi.explainers.ale.ALE.explain() method.

• features – A list of features for which to plot the ALE curves or 'all' for all features. Can be a mix of integers denoting feature index or strings denoting entries in exp.feature_names. Defaults to 'all'.

• targets – A list of targets for which to plot the ALE curves or 'all' for all targets. Can be a mix of integers denoting target index or strings denoting entries in exp.target_names. Defaults to 'all'.

• n_cols – Number of columns to organize the resulting plot into.

• sharey – A parameter specifying whether the y-axis of the ALE curves should be on the same scale for several features. Possible values are: 'all' | 'row' | None.

• constant – A parameter specifying whether the constant zeroth order effects should be added to the ALE first order effects.

• ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

• line_kw – Keyword arguments passed to the plt.plot function.

• fig_kw – Keyword arguments passed to the fig.set function.

Returns

An array of matplotlib axes with the resulting ALE plots.

alibi.explainers.plot_pd(exp, features='all', target=0, n_cols=3, n_ice=100, center=False, pd_limits=None, levels=8, ax=None, sharey='all', pd_num_kw=None, ice_num_kw=None, pd_cat_kw=None, ice_cat_kw=None, pd_num_num_kw=None, pd_num_cat_kw=None, pd_cat_cat_kw=None, fig_kw=None)[source]

Plot partial dependence curves on matplotlib axes.

Parameters
• exp – An Explanation object produced by a call to the alibi.explainers.partial_dependence.PartialDependence.explain() method.

• features – A list of features entries in the exp.data[‘feature_names’] to plot the partial dependence curves for, or 'all' to plot all the explained feature or tuples of features. This includes tuples of features. For example, if exp.data['feature_names'] = ['temp', 'hum', ('temp', 'windspeed')] and we want to plot the partial dependence only for the 'temp' and ('temp', 'windspeed'), then we would set features=[0, 2]. Defaults to 'all'.

• target – The target name or index for which to plot the partial dependence (PD) curves. Can be a mix of integers denoting target index or strings denoting entries in exp.meta[‘params’][‘target_names’].

• n_cols – Number of columns to organize the resulting plot into.

• n_ice

Number of ICE plots to be displayed. Can be

• a string taking the value 'all' to display the ICE curves for every instance in the reference dataset.

• an integer for which n_ice instances from the reference dataset will be sampled uniformly at random to display their ICE curves.

• a list of integers, where each integer represents an index of an instance in the reference dataset to display their ICE curves.

• center

Boolean flag to center the individual conditional expectation (ICE) curves. As mentioned in Goldstein et al. (2014), the heterogeneity in the model can be difficult to discern when the intercepts of the ICE curves cover a wide range. Centering the ICE curves removes the level effects and helps to visualise the heterogeneous effect.

• pd_limits – Minimum and maximum y-limits for all the one-way PD plots. If None will be automatically inferred.

• levels – Number of levels in the contour plot.

• ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

• sharey – A parameter specifying whether the y-axis of the PD and ICE curves should be on the same scale for several features. Possible values are: 'all' | 'row' | None.

• pd_num_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a numerical feature.

• ice_num_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the ICE for a numerical feature.

• pd_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a categorical feature.

• ice_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the ICE for a categorical feature.

• pd_num_num_kw – Keyword arguments passed to the matplotlib.pyplot.contourf function when plotting the PD for two numerical features.

• pd_num_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a numerical and a categorical feature.

• pd_cat_cat_kw – Keyword arguments passed to the alibi.utils.visualization.heatmap() functon when plotting the PD for two categorical features.

• fig_kw

Keyword arguments passed to the matplotlib.figure.set function.

Returns

An array of plt.Axes with the resulting partial dependence plots.

alibi.explainers.plot_pd_variance(exp, features='all', targets='all', summarise=True, n_cols=3, sort=True, top_k=None, plot_limits=None, ax=None, sharey='all', bar_kw=None, line_kw=None, fig_kw=None)[source]

Plot feature importance and feature interaction based on partial dependence curves on matplotlib axes.

Parameters
• exp (Explanation) – An Explanation object produced by a call to the alibi.explainers.pd_variance.PartialDependenceVariance.explain() method.

• features (Union[List[int], Literal[‘all’]]) – A list of features entries provided in feature_names argument to the alibi.explainers.pd_variance.PartialDependenceVariance.explain() method, or 'all' to plot all the explained features. For example, if feature_names = ['temp', 'hum', 'windspeed'] and we want to plot the values only for the 'temp' and 'windspeed', then we would set features=[0, 2]. Defaults to 'all'.

• targets (Union[List[Union[int, str]], Literal[‘all’]]) – A target name/index, or a list of target names/indices, for which to plot the feature importance/interaction, or 'all'. Can be a mix of integers denoting target index or strings denoting entries in exp.meta[‘params’][‘target_names’]. By default 'all' to plot the importance for all features or to plot all the feature interactions.

• summarise (bool) – Whether to plot only the summary of the feature importance/interaction as a bar plot, or plot comprehensive exposition including partial dependence plots and conditional importance plots.

• n_cols (int) – Number of columns to organize the resulting plot into.

• sort (bool) – Boolean flag whether to sort the values in descending order.

• top_k (Optional[int]) – Number of top k values to be displayed if the sort=True. If not provided, then all values will be displayed.

• plot_limits (Optional[Tuple[float, float]]) – Minimum and maximum y-limits for all the line plots. If None will be automatically inferred.

• ax (Union[Axes, ndarray, None]) – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

• sharey (Optional[Literal[‘all’, ‘row’]]) – A parameter specifying whether the y-axis of the PD and ICE curves should be on the same scale for several features. Possible values are: 'all' | 'row' | None.

• bar_kw (Optional[dict]) – Keyword arguments passed to the matplotlib.pyplot.barh function.

• line_kw (Optional[dict]) – Keyword arguments passed to the matplotlib.pyplot.plot function.

• fig_kw (Optional[dict]) –

Keyword arguments passed to the matplotlib.figure.set function.

Returns

plt.Axes with the summary/detailed exposition plot of the feature importance or feature interaction.

alibi.explainers.plot_permutation_importance(exp, features='all', metric_names='all', n_cols=3, sort=True, top_k=None, ax=None, bar_kw=None, fig_kw=None)[source]

Plot permutation feature importance on matplotlib axes.

Parameters
• exp – An Explanation object produced by a call to the alibi.explainers.permutation_importance.PermutationImportance.explain() method.

• features – A list of feature entries provided in feature_names argument to the alibi.explainers.permutation_importance.PermutationImportance.explain() method, or 'all' to plot all the explained features. For example, consider that the feature_names = ['temp', 'hum', 'windspeed', 'season']. If we set features=None in the explain method, meaning that all the feature were explained, and we want to plot only the values for the 'temp' and 'windspeed', then we would set features=[0, 2]. Otherwise, if we set features=[1, 2, 3] in the explain method, meaning that we explained ['hum', 'windspeed', 'season'], and we want to plot the values only for ['windspeed', 'season'], then we would set features=[1, 2] (i.e., their index in the features list passed to the explain method). Defaults to 'all'.

• metric_names – A list of metric entries in the exp.data[‘metrics’] to plot the permutation feature importance for, or 'all' to plot the permutation feature importance for all metrics (i.e., loss and score functions). The ordering is given by the concatenation of the loss metrics followed by the score metrics.

• n_cols – Number of columns to organize the resulting plot into.

• sort – Boolean flag whether to sort the values in descending order.

• top_k – Number of top k values to be displayed if the sort=True. If not provided, then all values will be displayed.

• ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

• bar_kw – Keyword arguments passed to the matplotlib.pyplot.barh function.

• fig_kw

Keyword arguments passed to the matplotlib.figure.set function.

Returns

plt.Axes with the feature importance plot.