alibi.explainers.anchors.anchor_base module

class alibi.explainers.anchors.anchor_base.AnchorBaseBeam(samplers, **kwargs)[source]

Bases: object

__init__(samplers, **kwargs)[source]

Parameters:: samplers (List[Callable]) – Objects that can be called with args (result, n_samples) tuple to draw samples.

anchor_beam(delta=0.05, epsilon=0.1, desired_confidence=1.0, beam_size=1, epsilon_stop=0.05, min_samples_start=100, max_anchor_size=None, stop_on_first=False, batch_size=100, coverage_samples=10000, verbose=False, verbose_every=1, **kwargs)[source]

Uses the KL-LUCB algorithm (Kaufmann and Kalyanakrishnan, 2013) together with additional sampling to search feature sets (anchors) that guarantee the prediction made by a classifier model. The search is greedy if beam_size=1. Otherwise, at each of the max_anchor_size steps, beam_size solutions are explored. By construction, solutions found have high precision (defined as the expected of number of times the classifier makes the same prediction when queried with the feature subset combined with arbitrary samples drawn from a noise distribution). The algorithm maximises the coverage of the solution found - the frequency of occurrence of records containing the feature subset in set of samples.

Parameters:

delta (float) – Used to compute beta.
epsilon (float) – Precision bound tolerance for convergence.
desired_confidence (float) – Desired level of precision (tau in paper).
beam_size (int) – Beam width.
epsilon_stop (float) – Confidence bound margin around desired precision.
min_samples_start (int) – Min number of initial samples.
max_anchor_size (Optional[int]) – Max number of features in result.
stop_on_first (bool) – Stop on first valid result found.
coverage_samples (int) – Number of samples from which to build a coverage set.
batch_size (int) – Number of samples used for an arm evaluation.
verbose (bool) – Whether to print intermediate LUCB & anchor selection output.
verbose_every (int) – Print intermediate output every verbose_every steps.

Return type:

dict

Returns:

Explanation dictionary containing anchors with metadata like coverage and precision and examples.

static compute_beta(n_features, t, delta)[source]

Parameters:

n_features (int) – Number of candidate anchors.
t (int) – Iteration number.
delta (float) – Confidence budget, candidate anchors have close to optimal precisions with prob. 1 - delta.

Return type:

float

Returns:

Level used to update upper and lower precision bounds.

static dlow_bernoulli(p, level, n_iter=17)[source]

Update lower precision bound for a candidate anchors dependent on the KL-divergence.

Parameters:

p (ndarray) – Precision of candidate anchors.
level (ndarray) – beta / nb of samples for each result.
n_iter (int) – Number of iterations during lower bound update.

Return type:

ndarray

Returns:

Updated lower precision bounds array.

draw_samples(anchors, batch_size)[source]

Parameters:

anchors (list) – Anchors on which samples are conditioned.
batch_size (int) – The number of samples drawn for each result.

Return type:

Tuple[tuple, tuple]

Returns:

A tuple of positive samples (for which prediction matches desired label) and a tuple of total number of samples drawn.

static dup_bernoulli(p, level, n_iter=17)[source]

Update upper precision bound for a candidate anchors dependent on the KL-divergence.

Parameters:

p (ndarray) – Precision of candidate anchors.
level (ndarray) – beta / nb of samples for each result.
n_iter (int) – Number of iterations during lower bound update.

Return type:

ndarray

Returns:

Updated upper precision bounds array.

get_anchor_metadata(features, success, batch_size=100)[source]

Given the features contained in a result, it retrieves metadata such as the precision and coverage of the result and partial anchors and examples where the result/partial anchors apply and yield the same prediction as on the instance to be explained (covered_true) or a different prediction (covered_false).

Parameters:

features (tuple) – Sorted indices of features in result.
success – Indicates whether an anchor satisfying precision threshold was met or not.
batch_size (int) – Number of samples among which positive and negative examples for partial anchors are selected if partial anchors have not already been explicitly sampled.

Return type:

dict

Returns:

Anchor dictionary with result features and additional metadata.

get_init_stats(anchors, coverages=False)[source]

Finds the number of samples already drawn for each result in anchors, their comparisons with the instance to be explained and, optionally, coverage.

Parameters:

anchors (list) – Candidate anchors.
coverages – If True, the statistics returned contain the coverage of the specified anchors.

Return type:

dict

Returns:

Dictionary with lists containing nb of samples used and where sample predictions equal the desired label.

kllucb(anchors, init_stats, epsilon, delta, batch_size, top_n, verbose=False, verbose_every=1)[source]

Implements the KL-LUCB algorithm (Kaufmann and Kalyanakrishnan, 2013).

Parameters:

anchors (list) – A list of anchors from which two critical anchors are selected (see Kaufmann and Kalyanakrishnan, 2013).
init_stats (dict) – Dictionary with lists containing nb of samples used and where sample predictions equal the desired label.
epsilon (float) – Precision bound tolerance for convergence.
delta (float) – Used to compute beta.
batch_size (int) – Number of samples.
top_n (int) – Min of beam width size or number of candidate anchors.
verbose (bool) – Whether to print intermediate output.
verbose_every (int) – Whether to print intermediate output every verbose_every steps.

Return type:

ndarray

Returns:

Indices of best result options. Number of indices equals min of beam width or nb of candidate anchors.

propose_anchors(previous_best)[source]

Parameters:: previous_best (list) – List with tuples of result candidates.
Return type:: list
Returns:: List with tuples of candidate anchors with additional metadata.

select_critical_arms(means, ub, lb, n_samples, delta, top_n, t)[source]

Determines a set of two anchors by updating the upper bound for low empirical precision anchors and the lower bound for anchors with high empirical precision.

Parameters:

means (ndarray) – Empirical mean result precisions.
ub (ndarray) – Upper bound on result precisions.
lb (ndarray) – Lower bound on result precisions.
n_samples (ndarray) – The number of samples drawn for each candidate result.
delta (float) – Confidence budget, candidate anchors have close to optimal precisions with prob. 1 - delta.
top_n (int) – Number of arms to be selected.
t (int) – Iteration number.

Returns:

Upper and lower precision bound indices.

static to_sample(means, ubs, lbs, desired_confidence, epsilon_stop)[source]

Given an array of mean result precisions and their upper and lower bounds, determines for which anchors more samples need to be drawn in order to estimate the anchors precision with desired_confidence and error tolerance.

Parameters:

means (ndarray) – Mean precisions (each element represents a different result).
ubs (ndarray) – Precisions’ upper bounds (each element represents a different result).
lbs (ndarray) – Precisions’ lower bounds (each element represents a different result).
desired_confidence (float) – Desired level of confidence for precision estimation.
epsilon_stop (float) – Tolerance around desired precision.

Returns:

Boolean array indicating whether more samples are to be drawn for that particular result.

update_state(covered_true, covered_false, labels, samples, anchor)[source]

Updates the explainer state (see alibi.explainers.anchors.anchor_base.AnchorBaseBeam.__init__() for full state definition).

Parameters:

covered_true (ndarray) – Examples where the result applies and the prediction is the same as on the instance to be explained.
covered_false (ndarray) – Examples where the result applies and the prediction is the different to the instance to be explained.
samples (Tuple[ndarray, float]) – A tuple containing discretized data, coverage and the result sampled.
labels (ndarray) – An array indicating whether the prediction on the sample matches the label of the instance to be explained.
anchor (tuple) – The result to be updated.

Return type:

Tuple[int, int]

Returns:

A tuple containing the number of instances equals desired label of observation to be explained the total number of instances sampled, and the result that was sampled.