alibi.explainers.cfproto module¶

class
alibi.explainers.cfproto.
CounterFactualProto
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Bases:
object

__init__
(predict, shape, kappa=0.0, beta=0.1, feature_range=(10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, enc_model=None, theta=0.0, cat_vars=None, ohe=False, use_kdtree=False, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(1000.0, 1000.0), update_num_grad=1, write_dir=None, sess=None)[source]¶ Initialize prototypical counterfactual method.
 Parameters
predict (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesshape (
tuple
) – Shape of input data starting with batch sizekappa (
float
) – Confidence parameter for the attack loss termbeta (
float
) – Regularization constant for L1 loss termfeature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for featurewise rangesgamma (
float
) – Regularization constant for optional autoencoder loss termae_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional autoencoder model used for loss regularizationenc_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional encoder model used to guide instance perturbations towards a class prototypetheta (
float
) – Constant for the prototype search loss termcat_vars (
dict
) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.ohe (
bool
) – Whether the categorical variables are onehot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.use_kdtree (
bool
) – Whether to use kd trees for the prototype loss term if no encoder is availablelearning_rate_init (
float
) – Initial learning rate of optimizermax_iterations (
int
) – Maximum number of iterations for finding a counterfactualc_init (
float
) – Initial value to scale the attack loss termc_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss termeps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graphupdate_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterationswrite_dir (
str
) – Directory to write tensorboard files tosess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
 Return type

attack
(X, Y, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Find a counterfactual (CF) for instance X using a fast iterative shrinkagethresholding algorithm (FISTA).
 Parameters
X (numpy.ndarray) – Instance to attack
Y (numpy.ndarray) – Labels for X as onehotencoding
target_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
Overall best attack and gradients for that attack.

explain
(X, Y=None, target_class=None, k=None, k_type='mean', threshold=0.0, verbose=False, print_every=100, log_every=100)[source]¶ Explain instance and return counterfactual with metadata.
 Parameters
X (numpy.ndarray) – Instances to attack
Y (
Optional
[numpy.ndarray]) – Labels for X as onehotencodingtarget_class (
Optional
[list
]) – List with target classes used to find closest prototype. If None, the nearest prototype except for the predict class on the instance is used.k (
Optional
[int
]) – Number of nearest instances used to define the prototype for a class. Defaults to using all instances belonging to the class if an encoder is used and to 1 for kd trees.k_type (
str
) – Use either the average encoding of the k nearest instances in a class (k_type=’mean’) or the knearest encoding in the class (k_type=’point’) to define the prototype of that class. Only relevant if an encoder is used to define the prototypes.threshold (
float
) – Threshold level for the ratio between the distance of the counterfactual to the prototype of the predicted class for the original instance over the distance to the prototype of the predicted class for the counterfactual. If the trust score is below the threshold, the proposed counterfactual does not meet the requirements.verbose (
bool
) – Print intermediate results of optimization if Trueprint_every (
int
) – Print frequency if verbose is Truelog_every (
int
) – Tensorboard log frequency if write directory is specified
 Return type
 Returns
explanation – Dictionary containing the counterfactual with additional metadata

fit
(train_data, trustscore_kwargs=None, d_type='abdm', w=None, disc_perc=(25, 50, 75), standardize_cat_vars=False, smooth=1.0, center=True, update_feature_range=True)[source]¶ Get prototypes for each class using the encoder or kd trees. The prototypes are used for the encoder loss term or to calculate the optional trust scores.
 Parameters
train_data (numpy.ndarray) – Representative sample from the training data.
trustscore_kwargs (
Optional
[dict
]) – Optional arguments to initialize the trust scores method.d_type (
str
) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdmmvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdmmvdm’ is a weighted combination of the two metrics.w (
Optional
[float
]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdmmvdm’.disc_perc (
Sequence
[Union
[int
,float
]]) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdmmvdm’ pairwise distance measures.standardize_cat_vars (
bool
) – Standardize numerical values of categorical variables if True.smooth (
float
) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.center (
bool
) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.update_feature_range (
bool
) – Update feature range with scaled values.
 Return type
None

get_gradients
(X, Y, grads_shape, cat_vars_ord=None)[source]¶ Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
 Parameters
X (numpy.ndarray) – Instance around which gradient is evaluated
Y (numpy.ndarray) – Onehot representation of instance labels
grads_shape (
tuple
) – Shape of gradients.cat_vars_ord (
Optional
[dict
]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.
 Return type
numpy.ndarray
 Returns
Array with gradients.

loss_fn
(pred_proba, Y)[source]¶ Compute the attack loss.
 Parameters
pred_proba (numpy.ndarray) – Prediction probabilities of an instance
Y (numpy.ndarray) – Onehot representation of instance labels
 Return type
numpy.ndarray
 Returns
Loss of the attack.
