alibi.explainers.cfproto module¶

class
alibi.explainers.cfproto.
CounterFactualProto
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Bases:
object

__init__
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Initialize prototypical counterfactual method.
 Parameters
predict (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesshape (
tuple
) – Shape of input data starting with batch sizekappa (
float
) – Confidence parameter for the attack loss termbeta (
float
) – Regularization constant for L1 loss termfeature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for featurewise rangesgamma (
float
) – Regularization constant for optional autoencoder loss termae_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional autoencoder model used for loss regularizationenc_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional encoder model used to guide instance perturbations towards a class prototypetheta (
float
) – Constant for the prototype search loss termuse_kdtree (
bool
) – Whether to use kd trees for the prototype loss term if no encoder is availablelearning_rate_init (
float
) – Initial learning rate of optimizermax_iterations (
int
) – Maximum number of iterations for finding a counterfactualc_init (
float
) – Initial value to scale the attack loss termc_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss termeps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graphupdate_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterationswrite_dir (
str
) – Directory to write tensorboard files tosess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
 Return type

attack
(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Find a counterfactual (CF) for instance X using a fast iterative shrinkagethresholding algorithm (FISTA).
 Parameters
X (numpy.ndarray) – Instance to attack
Y (numpy.ndarray) – Labels for X as onehotencoding
target_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
Overall best attack and gradients for that attack.

explain
(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Explain instance and return counterfactual with metadata.
 Parameters
X (numpy.ndarray) – Instances to attack
Y (
Optional
[numpy.ndarray]) – Labels for X as onehotencodingtarget_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
explanation – Dictionary containing the counterfactual with additional metadata

fit
(train_data, trustscore_kwargs=None)[source]¶ Get prototypes for each class using the encoder or kd trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.

get_gradients
(X, Y)[source]¶ Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
 Parameters
X (numpy.ndarray) – Instance around which gradient is evaluated
Y (numpy.ndarray) – Onehot representation of instance labels
 Return type
numpy.ndarray
 Returns
Array with gradients.

loss_fn
(pred_proba, Y)[source]¶ Compute the attack loss.
 Parameters
pred_proba (numpy.ndarray) – Prediction probabilities of an instance
Y (numpy.ndarray) – Onehot representation of instance labels
 Return type
numpy.ndarray
 Returns
Loss of the attack.

perturb
(X, eps, proba=False)[source]¶ Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.
 Parameters
 Return type
Tuple
[numpy.ndarray, numpy.ndarray] Returns
Instances where a positive and negative perturbation is applied.
