alibi.explainers.cem module
- class alibi.explainers.cem.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]
-
- __init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]
Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623
- Parameters:
predict (
Union
[Callable
[[ndarray
],ndarray
],Model
]) – tensorflow model or any other model’s prediction function returning class probabilities.mode (
str
) – Find pertinent negatives (PN) or pertinent positives (PP).shape (
tuple
) – Shape of input data starting with batch size.kappa (
float
) – Confidence parameter for the attack loss term.beta (
float
) – Regularization constant for L1 loss term.feature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be float or numpy arrays with dimension (1x nb of features) for feature-wise ranges.gamma (
float
) – Regularization constant for optional auto-encoder loss term.ae_model (
Optional
[Model
]) – Optional auto-encoder model used for loss regularization.learning_rate_init (
float
) – Initial learning rate of optimizer.max_iterations (
int
) – Maximum number of iterations for finding a PN or PP.c_init (
float
) – Initial value to scale the attack loss term.c_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss term.eps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features).clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the tensorflow graph.update_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterations.no_info_val (
Union
[float
,ndarray
,None
]) – Global or feature-wise value considered as containing no information.write_dir (
Optional
[str
]) – Directory to write tensorboard files to.sess (
Optional
[Session
]) – Optional tensorflow session that will be used if passed instead of creating or inferring one internally.
- attack(X, Y, verbose=False)[source]
Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).
- explain(X, Y=None, verbose=False)[source]
Explain instance and return PP or PN with metadata.
- Parameters:
- Return type:
- Returns:
explanation – Explanation object containing the PP or PN with additional metadata as attributes. See usage at CEM examples for details.
- get_gradients(X, Y)[source]
Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
- Parameters:
X (
ndarray
) – Instance around which gradient is evaluated.Y (
ndarray
) – One-hot representation of instance labels.
- Return type:
ndarray
- Returns:
Array with gradients.
- loss_fn(pred_proba, Y)[source]
Compute the attack loss.
- Parameters:
pred_proba (
ndarray
) – Prediction probabilities of an instance.Y (
ndarray
) – One-hot representation of instance labels.
- Return type:
ndarray
- Returns:
Loss of the attack.