alibi.explainers.anchor_tabular module
- class alibi.explainers.anchor_tabular.AnchorTabular(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]
Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin
- __init__(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]
- Parameters
predictor (
Callable
[[ndarray
],ndarray
]) – A callable that takes a tensor of N data points as inputs and returns N outputs.categorical_names (
Optional
[Dict
[int
,List
[str
]]]) – Dictionary where keys are feature columns and values are the categories for the feature.dtype (
Type
[generic
]) – A numpy scalar type that corresponds to the type of input array expected by predictor. This may be used to construct arrays of the given type to be passed through the predictor. For most use cases this argument should have no effect, but it is exposed for use with predictors that would break when called with an array of unsupported type.ohe (
bool
) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.seed (
Optional
[int
]) – Used to set the random number generator for repeatability purposes.
- Raises
alibi.exceptions.AlibiPredictorCallException – If calling predictor fails at runtime.
alibi.exceptions.AlibiPredictorReturnTypeError – If the return type of predictor is not np.ndarray.
- build_explanation(X, result, predicted_label, params)[source]
Preprocess search output and return an explanation object containing metdata
- Parameters
- Return type
- Returns
Explanation object containing human readable explanation, metadata, and precision/coverage
info as attributes.
- explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=100, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]
Explain prediction made by classifier on instance X.
- Parameters
X (
ndarray
) – Instance to be explained.threshold (
float
) – Minimum precision threshold.delta (
float
) – Used to compute beta.tau (
float
) – Margin between lower confidence bound and minimum precision or upper bound.batch_size (
int
) – Batch size used for sampling.coverage_samples (
int
) – Number of samples used to estimate coverage from during result search.beam_size (
int
) – The number of anchors extended at each step of new anchors construction.stop_on_first (
bool
) – If True, the beam search algorithm will return the first anchor that has satisfies the probability constraint.max_anchor_size (
Optional
[int
]) – Maximum number of features in result.min_samples_start (
int
) – Min number of initial samples.n_covered_ex (
int
) – How many examples where anchors apply to store for each anchor sampled during search (both examples where prediction on samples agrees/disagrees with desired_label are stored).binary_cache_size (
int
) – The result search pre-allocates binary_cache_size batches for storing the binary arrays returned during sampling.cache_margin (
int
) – When only max(cache_margin, batch_size) positions in the binary cache remain empty, a new cache of the same size is pre-allocated to continue buffering samples.verbose (
bool
) – Display updates during the anchor search iterations.verbose_every (
int
) – Frequency of displayed iterations during anchor search process.
- Return type
- Returns
explanation – Explanation object containing the result explaining the instance with additional metadata as attributes.
- fit(train_data, disc_perc=(25, 50, 75), **kwargs)[source]
Fit discretizer to train data to bin numerical features into ordered bins and compute statistics for numerical features. Create a mapping between the bin numbers of each discretised numerical feature and the row id in the training set where it occurs.
- Parameters
- Return type
- class alibi.explainers.anchor_tabular.DistributedAnchorTabular(predictor, feature_names, categorical_names=None, dtype=<class 'numpy.float32'>, ohe=False, seed=None)[source]
Bases:
alibi.explainers.anchor_tabular.AnchorTabular
- explain(X, threshold=0.95, delta=0.1, tau=0.15, batch_size=100, coverage_samples=10000, beam_size=1, stop_on_first=False, max_anchor_size=None, min_samples_start=1, n_covered_ex=10, binary_cache_size=10000, cache_margin=1000, verbose=False, verbose_every=1, **kwargs)[source]
Explains the prediction made by a classifier on instance X. Sampling is done in parallel over a number of cores specified in kwargs[‘ncpu’].
- Parameters
implementation. (See superclass) –
- Return type
- Returns
See superclass implementation.
- class alibi.explainers.anchor_tabular.RemoteSampler(*args)[source]
Bases:
object
A wrapper that facilitates the use of TabularSampler for distributed sampling.
- __call__(anchors_batch, num_samples, compute_labels=True)[source]
Wrapper around TabularSampler.__call__. It allows sampling a batch of anchors in the same process, which can improve performance.
- build_lookups(X)[source]
Wrapper around TabularSampler.build_lookups.
- Parameters
X – See TabularSampler.build_lookups.
- Returns
See TabularSampler.build_lookups.
- class alibi.explainers.anchor_tabular.TabularSampler(predictor, disc_perc, numerical_features, categorical_features, feature_names, feature_values, n_covered_ex=10, seed=None)[source]
Bases:
object
A sampler that uses an underlying training set to draw records that have a subset of features with values specified in an instance to be explained, X.
- __call__(anchor, num_samples, compute_labels=True)[source]
Obtain perturbed records by drawing samples from training data that contain the categorical labels and discretized numerical features and replacing the remainder of the record with arbitrary values.
- Parameters
anchor (
Tuple
[int
,tuple
]) – The integer represents the order of the result in a request array. The tuple contains encoded feature indices.num_samples (
int
) – Number of samples used when sampling from training set.compute_labels – If True, an array of comparisons between predictions on perturbed samples and instance to be explained is returned.
- Return type
- Returns
If compute_labels=True, a list containing the following is returned –
- covered_true: perturbed examples where the anchor applies and the model prediction
on perturbation is the same as the instance prediction
- covered_false: perturbed examples where the anchor applies and the model prediction
is NOT the same as the instance prediction
- labels: num_samples ints indicating whether the prediction on the perturbed sample
matches (1) the label of the instance to be explained or not (0)
data: Sampled data where ordinal features are binned (1 if in bin, 0 otherwise)
coverage: the coverage of the anchor
anchor[0]: position of anchor in the batch request
Otherwise, a list containing the data matrix only is returned.
- __init__(predictor, disc_perc, numerical_features, categorical_features, feature_names, feature_values, n_covered_ex=10, seed=None)[source]
- Parameters
predictor (
Callable
) – A callable that takes a tensor of N data points as inputs and returns N outputs.disc_perc (
Tuple
[Union
[int
,float
], …]) – Percentiles used for numerical feat. discretisation.numerical_features (
List
[int
]) – Numerical features column IDs.categorical_features (
List
[int
]) – Categorical features column IDs.feature_names (
list
) – Feature names.feature_values (
dict
) – Key: categorical feature column ID, value: values for the feature.n_covered_ex (
int
) – For each result, a number of samples where the prediction agrees/disagrees with the prediction on instance to be explained are stored.seed (
Optional
[int
]) – If set, fixes the random number sequence.
- build_lookups(X)[source]
An encoding of the feature IDs is created by assigning each bin of a discretized numerical variable and each categorical variable a unique index. For a dataset containg, e.g., a numerical variable with 5 bins and 3 categorical variables, indices 0 - 4 represent bins of the numerical variable whereas indices 5, 6, 7 represent the encoded indices of the categorical variables (but see note for caviats). The encoding is necessary so that the different ranges of the numerical variable can be sampled during result construction. Note that the encoded indices represent the predicates used during the anchor construction process (i.e., and anchor is a collection of encoded indices.
Note: Each continuous variable has n_bins - 1 corresponding entries in ord_lookup.
- Parameters
X (
ndarray
) – instance to be explained- Return type
- Returns
a list containing three dictionaries, whose keys are encoded feature IDs –
cat_lookup: maps categorical variables to their value in X
ord_lookup: maps discretized numerical variables to the bins they can be sampled from given X
enc2feat_idx: maps the encoded IDs to the original (training set) feature column IDs
- compare_labels(samples)[source]
Compute the agreement between a classifier prediction on an instance to be explained and the prediction on a set of samples which have a subset of features fixed to specific values.
- Parameters
samples (
ndarray
) – Samples whose labels are to be compared with the instance label.- Return type
ndarray
- Returns
An array of integers indicating whether the prediction was the same as the instance label.
- deferred_init(train_data, d_train_data)[source]
Initialise the Tabular sampler object with data, discretizer, feature statistics and build an index from feature values and bins to database rows for each feature.
- get_features_index(anchor)[source]
Given an anchor, this function finds the row indices in the training set where the feature has the same value as the feature in the instance to be explained (for ordinal variables, the row indices are those of rows which contain records with feature values in the same bin). The algorithm uses both the feature encoded ids in anchor and the feature ids in the input data set. The two are mapped by self.enc2feat_idx.
- Parameters
anchor (
tuple
) – The anchor for which the training set row indices are to be retrieved. The ints represent encoded feature ids.- Return type
Tuple
[Dict
[int
,Set
[int
]],Dict
[int
,Any
],List
[Tuple
[int
,str
,Union
[Any
,int
]]]]- Returns
allowed_bins – Maps original feature ids to the bins that the feature should be sampled from given the input anchor.
allowed_rows – Maps original feature ids to the training set rows where these features have the same value as the anchor.
unk_feat_values – When a categorical variable with the specified value/discretized variable in the specified bin is not found in the training set, a tuple is added to unk_feat_values to indicate the original feature id, its type (‘c’=categorical, o=’discretized continuous’) and the value/bin it should be sampled from.
- handle_unk_features(allowed_bins, num_samples, samples, unk_feature_values)[source]
Replaces unknown feature values with defaults. For categorical variables, the replacement value is the same as the value of the unknown feature. For continuous variables, a value is sampled uniformly at random from the feature range.
- Parameters
allowed_bins (
Dict
[int
,Set
[int
]]) – See get_feature_index method.num_samples (
int
) – Number of replacement values.samples (
ndarray
) – Contains the samples whose values are to be replaced.unk_feature_values (
List
[Tuple
[int
,str
,Union
[Any
,int
]]]) – List of tuples where: [0] is original feature id, [1] feature type, [2] if var is categorical, replacement value, otherwise None
- Return type
- perturbation(anchor, num_samples)[source]
Implements functionality described in __call__.
- Parameters
- Return type
- Returns
samples – Sampled data from training set.
d_samples – Like samples, but continuous data is converted to oridinal discrete data (binned).
coverage – The coverage of the result in the training data.
- replace_features(samples, allowed_rows, uniq_feat_ids, partial_anchor_rows, nb_partial_anchors, num_samples)[source]
The method creates perturbed samples by first replacing all partial anchors with partial anchors drawn from the training set. Then remainder of the features are then replaced with random values drawn from the same bin for discretized continuous features and same value for categorical features.
- Parameters
samples (
ndarray
) – Randomly drawn samples, where the anchor does not apply.allowed_rows (
Dict
[int
,Any
]) – Maps feature ids to the rows indices in training set where the feature has same value as instance (cat.) or is in the same bin.uniq_feat_ids (
List
[int
]) – Multiple encoded features in the anchor can map to the same original feature id. Unique features in the anchor. This is the list of unique original features id in the anchor.partial_anchor_rows (
List
[ndarray
]) – The rows in the training set where each partial anchor applies. Last entry is an array of row indices where the entire anchor applies.nb_partial_anchors (
ndarray
) – The number of training records which contain each partial anchor.num_samples (
int
) – Number of perturbed samples to be returned.
- Return type
- set_instance_label(X)[source]
Sets the sampler label. Necessary for setting the remote sampling process state during explain call.
- Parameters
X (
ndarray
) – Instance to be explained.- Return type