alibi.explainers.cem module

class alibi.explainers.cem.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(- 10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(- 100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Bases: alibi.api.interfaces.Explainer, alibi.api.interfaces.FitMixin

__init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(- 10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(- 100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623

Parameters
  • predict (Union[Callable[[ndarray], ndarray], Model]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities

  • mode (str) – Find pertinant negatives (‘PN’) or pertinant positives (‘PP’)

  • shape (tuple) – Shape of input data starting with batch size

  • kappa (float) – Confidence parameter for the attack loss term

  • beta (float) – Regularization constant for L1 loss term

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges

  • gamma (float) – Regularization constant for optional auto-encoder loss term

  • ae_model (Optional[Model]) – Optional auto-encoder model used for loss regularization

  • learning_rate_init (float) – Initial learning rate of optimizer

  • max_iterations (int) – Maximum number of iterations for finding a PN or PP

  • c_init (float) – Initial value to scale the attack loss term

  • c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term

  • eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)

  • clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph

  • update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations

  • no_info_val (Union[float, ndarray, None]) – Global or feature-wise value considered as containing no information

  • write_dir (Optional[str]) – Directory to write tensorboard files to

  • sess (Optional[Session]) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

attack(X, Y, verbose=False)[source]

Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters
  • X (ndarray) – Instance to attack

  • Y (ndarray) – Labels for X

  • verbose (bool) – Print intermediate results of optimization if True

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, verbose=False)[source]

Explain instance and return PP or PN with metadata.

Parameters
  • X (ndarray) – Instances to attack

  • Y (Optional[ndarray]) – Labels for X

  • verbose (bool) – Print intermediate results of optimization if True

Return type

Explanation

Returns

explanationExplanation object containing the PP or PN with additional metadata as attributes.

fit(train_data, no_info_type='median')[source]

Get ‘no information’ values from the training data.

Parameters
  • train_data (ndarray) – Representative sample from the training data

  • no_info_type (str) – Median or mean value by feature supported

Return type

CEM

get_gradients(X, Y)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters
  • X (ndarray) – Instance around which gradient is evaluated

  • Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Array with gradients.

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters
  • pred_proba (ndarray) – Prediction probabilities of an instance

  • Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Loss of the attack.

perturb(X, eps, proba=False)[source]

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters
  • X (ndarray) – Array to be perturbed

  • eps (Union[float, ndarray]) – Size of perturbation

  • proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1

Return type

Tuple[ndarray, ndarray]

Returns

Instances where a positive and negative perturbation is applied.

reset_predictor(predictor)[source]
Return type

None