alibi.explainers.cfproto module

class alibi.explainers.cfproto.CounterFactualProto(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Bases: alibi.api.interfaces.Explainer, alibi.api.interfaces.FitMixin

__init__(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Initialize prototypical counterfactual method.

Parameters
  • predict (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

  • shape (tuple) – Shape of input data starting with batch size

  • kappa (float) – Confidence parameter for the attack loss term

  • beta (float) – Regularization constant for L1 loss term

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges

  • gamma (float) – Regularization constant for optional auto-encoder loss term

  • ae_model (Union[tensorflow.keras.Model, Model')]) – Optional auto-encoder model used for loss regularization

  • enc_model (Union[tensorflow.keras.Model, Model')]) – Optional encoder model used to guide instance perturbations towards a class prototype

  • theta (float) – Constant for the prototype search loss term

  • cat_vars (dict) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

  • ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

  • use_kdtree (bool) – Whether to use k-d trees for the prototype loss term if no encoder is available

  • learning_rate_init (float) – Initial learning rate of optimizer

  • max_iterations (int) – Maximum number of iterations for finding a counterfactual

  • c_init (float) – Initial value to scale the attack loss term

  • c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term

  • eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)

  • clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph

  • update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations

  • write_dir (str) – Directory to write tensorboard files to

  • sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

attack(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Find a counterfactual (CF) for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
  • X (ndarray) – Instance to attack

  • Y (ndarray) – Labels for X as one-hot-encoding

  • target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

  • k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

  • k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

  • threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

  • verbose (bool) – Print intermediate results of optimization if True

  • print_every (int) – Print frequency if verbose is True

  • log_every (int) – Tensorboard log frequency if write directory is specified

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Explain instance and return counterfactual with metadata.

Parameters
  • X (ndarray) – Instances to attack

  • Y (Optional[ndarray]) – Labels for X as one-hot-encoding

  • target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

  • k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

  • k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

  • threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

  • verbose (bool) – Print intermediate results of optimization if True

  • print_every (int) – Print frequency if verbose is True

  • log_every (int) – Tensorboard log frequency if write directory is specified

Return type

Explanation

Returns

explanation – Dictionary containing the counterfactual with additional metadata

fit(train_data, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=(25, 50, 75), standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)[source]

Get prototypes for each class using the encoder or k-d trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.

Parameters
  • train_data (ndarray) – Representative sample from the training data.

  • trustscore_kwargs (Optional[dict]) – Optional arguments to initialize the trust scores method.

  • d_type (str) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdm-mvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdm-mvdm’ is a weighted combination of the two metrics.

  • w (Optional[float]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdm-mvdm’.

  • disc_perc (Sequence[Union[int, float]]) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdm-mvdm’ pairwise distance measures.

  • standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

  • smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.

  • center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

  • update_feature_range (bool) – Update feature range with scaled values.

Return type

CounterFactualProto

get_gradients(X, Y, grads_shape, cat_vars_ord=None)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
  • X (ndarray) – Instance around which gradient is evaluated

  • Y (ndarray) – One-hot representation of instance labels

  • grads_shape (tuple) – Shape of gradients.

  • cat_vars_ord (Optional[dict]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

Return type

ndarray

Returns

Array with gradients.

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
  • pred_proba (ndarray) – Prediction probabilities of an instance

  • Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Loss of the attack.

score(X, adv_class, orig_class, eps=1e-10)[source]
Parameters
  • X (ndarray) – Instance to encode and calculate distance metrics for

  • adv_class (int) – Predicted class on the perturbed instance

  • orig_class (int) – Predicted class on the original instance

  • eps (float) – Small number to avoid dividing by 0

Return type

float

Returns

  • Ratio between the distance to the prototype of the predicted class for the original instance and

  • the prototype of the predicted class for the perturbed instance.