alibi.explainers.integrated_gradients module

class alibi.explainers.integrated_gradients.IntegratedGradients(model, layer=None, target_fn=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]

Bases: alibi.api.interfaces.Explainer

__init__(model, layer=None, target_fn=None, method='gausslegendre', n_steps=50, internal_batch_size=100)[source]

An implementation of the integrated gradients method for Tensorflow and Keras models.

For details of the method see the original paper: .

  • model (Model) – Tensorflow or Keras model.

  • layer (Optional[Layer]) – Layer with respect to which the gradients are calculated. If not provided, the gradients are calculated with respect to the input.

  • method (str) – Method for the integral approximation. Methods available: “riemann_left”, “riemann_right”, “riemann_middle”, “riemann_trapezoid”, “gausslegendre”.

  • n_steps (int) – Number of step in the path integral approximation from the baseline to the input instance.

  • internal_batch_size (int) – Batch size for the internal batching.

build_explanation(X, forward_kwargs, baselines, target, attributions, deltas)[source]
Return type


explain(X, forward_kwargs=None, baselines=None, target=None, attribute_to_layer_inputs=False)[source]

Calculates the attributions for each input feature or element of layer and returns an Explanation object.

  • X (Union[ndarray, List[ndarray]]) – Instance for which integrated gradients attribution are computed.

  • forward_kwargs (Optional[dict]) – Input keyword args. If it’s not None, it must be a dict with numpy arrays as values. The first dimension of the arrays must correspond to the number of examples. It will be repeated for each of n_steps along the integrated path. The attributions are not computed with respect to these arguments.

  • baselines (Union[int, float, ndarray, List[int], List[float], List[ndarray], None]) – Baselines (starting point of the path integral) for each instance. If the passed value is an np.ndarray must have the same shape as X. If not provided, all features values for the baselines are set to 0.

  • target (Union[int, list, ndarray, None]) – Defines which element of the model output is considered to compute the gradients. It can be a list of integers or a numeric value. If a numeric value is passed, the gradients are calculated for the same element of the output for all data points. It must be provided if the model output dimension is higher than 1. For regression models whose output is a scalar, target should not be provided. For classification models target can be either the true classes or the classes predicted by the model.

  • attribute_to_layer_inputs (bool) – In case of layers gradients, controls whether the gradients are computed for the layer’s inputs or outputs. If True, gradients are computed for the layer’s inputs, if False for the layer’s outputs.

Return type



  • Explanation object including meta and data attributes with integrated gradients attributions

  • for each feature.

Return type