alibi.explainers.cem module

class alibi.explainers.cem.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Bases: Explainer, FitMixin

__init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]

Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623

Parameters:
  • predict (Union[Callable[[ndarray], ndarray], Model]) – tensorflow model or any other model’s prediction function returning class probabilities.

  • mode (str) – Find pertinent negatives (PN) or pertinent positives (PP).

  • shape (tuple) – Shape of input data starting with batch size.

  • kappa (float) – Confidence parameter for the attack loss term.

  • beta (float) – Regularization constant for L1 loss term.

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be float or numpy arrays with dimension (1x nb of features) for feature-wise ranges.

  • gamma (float) – Regularization constant for optional auto-encoder loss term.

  • ae_model (Optional[Model]) – Optional auto-encoder model used for loss regularization.

  • learning_rate_init (float) – Initial learning rate of optimizer.

  • max_iterations (int) – Maximum number of iterations for finding a PN or PP.

  • c_init (float) – Initial value to scale the attack loss term.

  • c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term.

  • eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features).

  • clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the tensorflow graph.

  • update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations.

  • no_info_val (Union[float, ndarray, None]) – Global or feature-wise value considered as containing no information.

  • write_dir (Optional[str]) – Directory to write tensorboard files to.

  • sess (Optional[Session]) – Optional tensorflow session that will be used if passed instead of creating or inferring one internally.

attack(X, Y, verbose=False)[source]

Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters:
  • X (ndarray) – Instance to attack.

  • Y (ndarray) – Labels for X.

  • verbose (bool) – Print intermediate results of optimization if True.

Return type:

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns:

Overall best attack and gradients for that attack.

explain(X, Y=None, verbose=False)[source]

Explain instance and return PP or PN with metadata.

Parameters:
  • X (ndarray) – Instances to attack.

  • Y (Optional[ndarray]) – Labels for X.

  • verbose (bool) – Print intermediate results of optimization if True.

Return type:

Explanation

Returns:

explanationExplanation object containing the PP or PN with additional metadata as attributes. See usage at CEM examples for details.

fit(train_data, no_info_type='median')[source]

Get ‘no information’ values from the training data.

Parameters:
  • train_data (ndarray) – Representative sample from the training data.

  • no_info_type (str) – Median or mean value by feature supported.

Return type:

CEM

get_gradients(X, Y)[source]

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters:
  • X (ndarray) – Instance around which gradient is evaluated.

  • Y (ndarray) – One-hot representation of instance labels.

Return type:

ndarray

Returns:

Array with gradients.

loss_fn(pred_proba, Y)[source]

Compute the attack loss.

Parameters:
  • pred_proba (ndarray) – Prediction probabilities of an instance.

  • Y (ndarray) – One-hot representation of instance labels.

Return type:

ndarray

Returns:

Loss of the attack.

perturb(X, eps, proba=False)[source]

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters:
  • X (ndarray) – Array to be perturbed.

  • eps (Union[float, ndarray]) – Size of perturbation.

  • proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1.

Return type:

Tuple[ndarray, ndarray]

Returns:

Instances where a positive and negative perturbation is applied.

reset_predictor(predictor)[source]

Resets the predictor function/model.

Parameters:

predictor (Union[Callable, Model]) – New predictor function/model.

Return type:

None