alibi.explainers.cfproto module

class alibi.explainers.cfproto.CounterFactualProto(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Bases: object

__init__(predict, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]

Initialize prototypical counterfactual method.

Parameters
  • predict (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

  • shape (tuple) – Shape of input data starting with batch size

  • kappa (float) – Confidence parameter for the attack loss term

  • beta (float) – Regularization constant for L1 loss term

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges

  • gamma (float) – Regularization constant for optional auto-encoder loss term

  • ae_model (Union[tensorflow.keras.Model, Model')]) – Optional auto-encoder model used for loss regularization

  • enc_model (Union[tensorflow.keras.Model, Model')]) – Optional encoder model used to guide instance perturbations towards a class prototype

  • theta (float) – Constant for the prototype search loss term

  • use_kdtree (bool) – Whether to use k-d trees for the prototype loss term if no encoder is available

  • learning_rate_init (float) – Initial learning rate of optimizer

  • max_iterations (int) – Maximum number of iterations for finding a counterfactual

  • c_init (float) – Initial value to scale the attack loss term

  • c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term

  • eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)

  • clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph

  • update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations

  • write_dir (str) – Directory to write tensorboard files to

  • sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

attack(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Find a counterfactual (CF) for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
  • X (numpy.ndarray) – Instance to attack

  • Y (numpy.ndarray) – Labels for X as one-hot-encoding

  • target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

  • k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

  • k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

  • threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

  • verbose (bool) – Print intermediate results of optimization if True

  • print_every (int) – Print frequency if verbose is True

  • log_every (int) – Tensorboard log frequency if write directory is specified

Return type

Tuple[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]

Explain instance and return counterfactual with metadata.

Parameters
  • X (numpy.ndarray) – Instances to attack

  • Y (Optional[numpy.ndarray]) – Labels for X as one-hot-encoding

  • target_class (Optional[list]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.

  • k (Optional[int]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for k-d trees.

  • k_type (str) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the k-nearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.

  • threshold (float) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.

  • verbose (bool) – Print intermediate results of optimization if True

  • print_every (int) – Print frequency if verbose is True

  • log_every (int) – Tensorboard log frequency if write directory is specified

Return type

dict

Returns

explanation – Dictionary containing the counterfactual with additional metadata

fit(train_data, trustscore_kwargs=None)[source]

Get prototypes for each class using the encoder or k-d trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.

Parameters
  • train_data (numpy.ndarray) – Representative sample from the training data

  • trustscore_kwargs (Optional[dict]) – Optional arguments to initialize the trust scores method

Return type

None

get_gradients(X, Y)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
  • X (numpy.ndarray) – Instance around which gradient is evaluated

  • Y (numpy.ndarray) – One-hot representation of instance labels

Return type

numpy.ndarray

Returns

Array with gradients.

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
  • pred_proba (numpy.ndarray) – Prediction probabilities of an instance

  • Y (numpy.ndarray) – One-hot representation of instance labels

Return type

numpy.ndarray

Returns

Loss of the attack.

perturb(X, eps, proba=False)[source]

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters
  • X (numpy.ndarray) – Array to be perturbed

  • eps (Union[float, numpy.ndarray]) – Size of perturbation

  • proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1

Return type

Tuple[numpy.ndarray, numpy.ndarray]

Returns

Instances where a positive and negative perturbation is applied.

score(X, adv_class, orig_class, eps=1e-10)[source]
Parameters
  • X (numpy.ndarray) – Instance to encode and calculate distance metrics for

  • adv_class (int) – Predicted class on the perturbed instance

  • orig_class (int) – Predicted class on the original instance

  • eps (float) – Small number to avoid dividing by 0

Return type

float

Returns

  • Ratio between the distance to the prototype of the predicted class for the original instance and

  • the prototype of the predicted class for the perturbed instance.