alibi.explainers.anchor_tabular module

class alibi.explainers.anchor_tabular.AnchorTabular(predictor, feature_names, categorical_names=None, seed=None)[source]

Bases: alibi.api.interfaces.Explainer, alibi.api.interfaces.FitMixin

__init__(predictor, feature_names, categorical_names=None, seed=None)[source]
Parameters
  • predictor (Callable) – A callable that takes a tensor of N data points as inputs and returns N outputs.

  • feature_names (list) – List with feature names.

  • categorical_names (Optional[dict]) – Dictionary where keys are feature columns and values are the categories for the feature.

  • seed (Optional[int]) – Used to set the random number generator for repeatability purposes.

Return type

None

add_names_to_exp(explanation)[source]

Add feature names to explanation dictionary.

Parameters

explanation (dict) – Dict with anchors and additional metadata.

Return type

None

build_explanation(X, result, predicted_label, params)[source]

Preprocess search output and return an explanation object containing metdata

Parameters
  • X (ndarray) – Instance to be explained.

  • result (dict) – Dictionary with explanation search output and metadata.

  • predicted_label (int) – Label of the instance to be explained (inferred if not given).

  • params (dict) – Parameters passed to explain

Return type

Explanation

Returns

Dictionary containing human readable explanation, metadata, and precision/coverage info.

explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explain prediction made by classifier on instance X.

Parameters
  • X (ndarray) – Instance to be explained.

  • threshold (float) – Minimum precision threshold.

  • delta (float) – Used to compute beta.

  • tau (float) – Margin between lower confidence bound and minimum precision or upper bound.

  • batch_size (int) – Batch size used for sampling.

  • coverage_samples (int) – Number of samples used to estimate coverage from during result search.

  • beam_size (int) – The number of anchors extended at each step of new anchors construction.

  • stop_on_first (bool) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.

  • max_anchor_size (Optional[int]) – Maximum number of features in result.

  • min_samples_start (int) – Min number of initial samples.

  • n_covered_ex (int) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).

  • binary_cache_size (int) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.

  • cache_margin (int) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.

  • verbose (bool) – Display updates during the anchor search iterations.

  • verbose_every (int) – Frequency of displayed iterations during anchor search process.

Return type

Explanation

Returns

explanation – Dictionary containing the result explaining the instance with additional metadata.

fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Fit discretizer to train data to bin numerical features into ordered bins and compute statistics for numerical features. Create a mapping between the bin numbers of each discretised numerical feature and the row id in the training set where it occurs.

Parameters
  • train_data (ndarray) – Representative sample from the training data.

  • disc_perc (Tuple[Union[int, float], …]) – List with percentiles (int) used for discretization.

Return type

AnchorTabular

class alibi.explainers.anchor_tabular.DistributedAnchorTabular(predictor, feature_names, categorical_names=None, seed=None)[source]

Bases: alibi.explainers.anchor_tabular.AnchorTabular

explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=1, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]

Explains the prediction made by a classifier on instance X. Sampling is done in parallel over a number of cores specified in kwargs[‘ncpu’].

Parameters

superclass implementation. (See) –

Return type

Explanation

Returns

See superclass implementation.

fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]

Creates a list of handles to parallel processes handles that are used for submitting sampling tasks.

Parameters

superclass implementation. (See) –

Return type

AnchorTabular

class alibi.explainers.anchor_tabular.RemoteSampler(*args)[source]

Bases: object

A wrapper that facilitates the use of TabularSampler for distributed sampling.

__call__(anchors_batch, num_samples, compute_labels=True)[source]

Wrapper around TabularSampler.__call__. It allows sampling a batch of anchors in the same process, which can improve performance.

Parameters
  • anchors_batch (Union[Tuple[int, tuple], List[Tuple[int, tuple]]]) – A list of result tuples. see TabularSampler.__call__ for details.

  • num_samples (int) – See TabularSampler.__call__.

  • compute_labels (bool) – See TabularSampler.__call__.

Return type

List

build_lookups(X)[source]

Wrapper around TabularSampler.build_lookups.

Parameters

X – See TabularSampler.build_lookups.

Returns

See TabularSampler.build_lookups.

set_instance_label(X)[source]

Sets the remote sampler instance label.

Parameters

X (ndarray) – The instance to be explained.

Return type

int

Returns

label – The label of the instance to be explained.

set_n_covered(n_covered)[source]

Sets the remote sampler number of examples to save for inspection.

Parameters

n_covered (int) – Number of examples where the result (and partial anchors) apply.

Return type

None

class alibi.explainers.anchor_tabular.TabularSampler(predictor, disc_perc, numerical_features, categorical_features, feature_names, feature_values, n_covered_ex=10, seed=None)[source]

Bases: object

A sampler that uses an underlying training set to draw records that have a subset of features with values specified in an instance to be expalined, X.

__call__(anchor, num_samples, compute_labels=True)[source]

Obtain perturbed records by drawing samples from training data that contain the categorical labels and discretized numerical features and replacing the remainder of the record with arbitrary values.

Parameters
  • anchor (Tuple[int, tuple]) – The integer represents the order of the result in a request array. The tuple contains encoded feature indices.

  • num_samples (int) – Number of samples used when sampling from training set.

  • compute_labels – If True, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.

Return type

Union[List[Union[ndarray, float, int]], List[ndarray]]

Returns

  • If compute_labels=True, a list containing the following is returned

    • covered_true: perturbed examples where the anchor applies and the model prediction

      on perturbation is the same as the instance prediction

    • covered_false: perturbed examples where the anchor applies and the model prediction

      is NOT the same as the instance prediction

    • labels: num_samples ints indicating whether the prediction on the perturbed sample

      matches (1) the label of the instance to be explained or not (0)

    • data: Sampled data where ordinal features are binned (1 if in bin, 0 otherwise)

    • coverage: the coverage of the anchor

    • anchor[0]: position of anchor in the batch request

  • Otherwise, a list containing the data matrix only is returned.

__init__(predictor, disc_perc, numerical_features, categorical_features, feature_names, feature_values, n_covered_ex=10, seed=None)[source]
Parameters
  • predictor (Callable) – A callable that takes a tensor of N data points as inputs and returns N outputs.

  • disc_perc (Tuple[Union[int, float], …]) – Percentiles used for numerical feat. discretisation.

  • numerical_features (List[int]) – Numerical features column IDs.

  • categorical_features (List[int]) – Categorical features column IDs.

  • feature_names (list) – Feature names.

  • feature_values (dict) – Key: categorical feature column ID, value: values for the feature.

  • n_covered_ex (int) – For each result, a number of samples where the prediction agrees/disagrees with the prediction on instance to be explained are stored.

  • seed (Optional[int]) – If set, fixes the random number sequence.

Return type

None

build_lookups(X)[source]

An encoding of the feature IDs is created by assigning each bin of a discretized numerical variable and each categorical variable a unique index. For a dataset containg, e.g., a numerical variable with 5 bins and 3 categorical variables, indices 0 - 4 represent bins of the numerical variable whereas indices 5, 6, 7 represent the encoded indices of the categorical variables (but see note for caviats). The encoding is necessary so that the different ranges of the numerical variable can be sampled during result construction. Note that the encoded indices represent the predicates used during the anchor construction process (i.e., and anchor is a collection of encoded indices.

Note: Each continuous variable has n_bins - 1 corresponding entries in ord_lookup.

Parameters

X (ndarray) – instance to be explained

Return type

List[Dict]

Returns

a list containing three dictionaries, whose keys are encoded feature IDs

  • cat_lookup: maps categorical variables to their value in X

  • ord_lookup: maps discretized numerical variables to the bins they can be sampled from given X

  • enc2feat_idx: maps the encoded IDs to the original (training set) feature column IDs

compare_labels(samples)[source]

Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of features fixed to specific values.

Parameters

samples (ndarray) – Samples whose labels are to be compared with the instance label.

Return type

ndarray

Returns

An array of integers indicating whether the prediction was the same as the instance label.

deferred_init(train_data, d_train_data)[source]

Initialise the Tabular sampler object with data, discretizer, feature statistics and build an index from feature values and bins to database rows for each feature.

Parameters
  • train_data (Union[ndarray, Any]) – Data from which samples are drawn. Can be a numpy array or a ray future.

  • d_train_data (Union[<built-in function array>, Any]) – Discretized version for training data. Can be a numpy array or a ray future.

Return type

Any

Returns

An initialised sampler.

get_features_index(anchor)[source]

Given an anchor, this function finds the row indices in the training set where the feature has the same value as the feature in the instance to be explained (for ordinal variables, the row indices are those of rows which contain records with feature values in the same bin). The algorithm uses both the feature encoded ids in anchor and the feature ids in the input data set. The two are mapped by self.enc2feat_idx.

Parameters

anchor (tuple) – The anchor for which the training set row indices are to be retrieved. The ints represent encoded feature ids.

Return type

Tuple[Dict[int, Set[int]], Dict[int, Any], List[Tuple[int, str, Union[Any, int]]]]

Returns

  • allowed_bins – Maps original feature ids to the bins that the feature should be sampled from given the input anchor.

  • allowed_rows – Maps original feature ids to the training set rows where these features have the same value as the anchor.

  • unk_feat_values – When a categorical variable with the specified value/discretized variable in the specified bin is not found in the training set, a tuple is added to unk_feat_values to indicate the original feature id, its type (‘c’=categorical, o=’discretized continuous’) and the value/bin it should be sampled from.

handle_unk_features(allowed_bins, num_samples, samples, unk_feature_values)[source]

Replaces unknown feature values with defaults. For categorical variables, the replacement value is the same as the value of the unknown feature. For continuous variables, a value is sampled uniformly at random from the feature range.

Parameters
  • allowed_bins (Dict[int, Set[int]]) – See get_feature_index method.

  • num_samples (int) – Number of replacement values.

  • samples (ndarray) – Contains the samples whose values are to be replaced.

  • unk_feature_values (List[Tuple[int, str, Union[Any, int]]]) – List of tuples where: [0] is original feature id, [1] feature type, [2] if var is categorical, replacement value, otherwise None

Return type

None

perturbation(anchor, num_samples)[source]

Implements functionality described in __call__.

Parameters
  • anchor (tuple) – Each int is an encoded feature id.

  • num_samples (int) – Number of samples.

Return type

Tuple[ndarray, ndarray, float]

Returns

  • samples – Sampled data from training set.

  • d_samples – Like samples, but continuous data is converted to oridinal discrete data (binned).

  • coverage – The coverage of the result in the training data.

replace_features(samples, allowed_rows, uniq_feat_ids, partial_anchor_rows, nb_partial_anchors, num_samples)[source]

The method creates perturbed samples by first replacing all partial anchors with partial anchors drawn from the training set. Then remainder of the features are then replaced with random values drawn from the same bin for discretized continuous features and same value for categorical features.

Parameters
  • samples (ndarray) – Randomly drawn samples, where the anchor does not apply.

  • allowed_rows (Dict[int, Any]) – Maps feature ids to the rows indices in training set where the feature has same value as instance (cat.) or is in the same bin.

  • uniq_feat_ids (List[int]) – Multiple encoded features in the anchor can map to the same original feature id. Unique features in the anchor. This is the list of unique original features id in the anchor.

  • partial_anchor_rows (List[ndarray]) – The rows in the training set where each partial anchor applies. Last entry is an array of row indices where the entire anchor applies.

  • nb_partial_anchors (ndarray) – The number of training records which contain each partial anchor.

  • num_samples (int) – Number of perturbed samples to be returned.

Return type

None

set_instance_label(X)[source]

Sets the sampler label. Necessary for setting the remote sampling process state during explain call.

Parameters

X (ndarray) – Instance to be explained.

Return type

None

set_n_covered(n_covered)[source]

Set the number of examples to be saved for each result and partial result during search process. The same number of examples is saved in the case where the predictions on perturbed samples and original instance agree or disagree.

Parameters

n_covered (int) – Number of examples to be saved.

Return type

None