alibi.explainers.cfrl_base module

class alibi.explainers.cfrl_base.Callback[source]

Bases: abc.ABC

Training callback class.

abstract __call__(step, update, model, sample, losses)[source]

Training callback applied after every training step.

  • step (int) – Current experience step.

  • update (int) – Current update step. The ration between the number experience steps and the number of training updates is bound to 1.

  • model (CounterfactualRL) – CounterfactualRL explainer. All the parameters defined in alibi.explainers.cfrl_base.DEFAULT_BASE_PARAMS can be accessed through ‘model.params’.

  • sample (Dict[str, ndarray]) –

    Dictionary of samples used for an update which contains

    • 'X': input instances.

    • 'Y_m': predictor outputs for the input instances.

    • 'Y_t': target outputs.

    • 'Z': input embeddings.

    • 'Z_cf_tilde': noised counterfactual embeddings.

    • 'X_cf_tilde': noised counterfactual instances obtained ofter decoding the noised counterfactual embeddings Z_cf_tilde and apply post-processing functions.

    • 'C': conditional vector.

    • 'R_tilde': reward obtained for the noised counterfactual instances.

    • 'Z_cf': counterfactual embeddings.

    • 'X_cf': counterfactual instances obtained after decoding the countefactual embeddings Z_cf and apply post-processing functions.

  • losses (Dict[str, float]) –

    Dictionary of losses which contains

Return type


class alibi.explainers.cfrl_base.CounterfactualRL(predictor, encoder, decoder, coeff_sparsity, coeff_consistency, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]

Bases: alibi.api.interfaces.Explainer, alibi.api.interfaces.FitMixin

Counterfactual Reinforcement Learning.

__init__(predictor, encoder, decoder, coeff_sparsity, coeff_consistency, latent_dim=None, backend='tensorflow', seed=0, **kwargs)[source]


  • predictor (Callable[[ndarray], ndarray]) – A callable that takes a tensor of N data points as inputs and returns N outputs. For classification task, the second dimension of the output should match the number of classes. Thus, the output can be either a soft label distribution or a hard label distribution (i.e. one-hot encoding) without affecting the performance since argmax is applied to the predictor’s output.

  • encoder (Union[tensorflow.keras.Model, torch.nn.Module]) – Pretrained encoder network.

  • decoder (Union[tensorflow.keras.Model, torch.nn.Module]) – Pretrained decoder network.

  • coeff_sparsity (float) – Sparsity loss coefficient.

  • coeff_consistency (float) – Consistency loss coefficient.

  • latent_dim (Optional[int]) – Autoencoder latent dimension. Can be omitted if the actor network is user specified.

  • backend (str) – Deep learning backend: tensorflow | pytorch. Default tensorflow.

  • seed (int) – Seed for reproducibility. The results are not reproducible for tensorflow backend.

  • kwargs – Used to replace any default parameter from alibi.explainers.cfrl_base.DEFAULT_BASE_PARAMS.

explain(X, Y_t, C=None, batch_size=100)[source]

Explains an input instance

  • X (ndarray) – Instances to be explained.

  • Y_t (ndarray) – Counterfactual targets.

  • C (Optional[ndarray]) – Conditional vectors. If None, it means that no conditioning was used during training (i.e. the conditional_func returns None).

  • batch_size (int) – Batch size to be used when generating counterfactuals.

Return type



Explanation object containing the inputs with the corresponding labels, the counterfactuals with the corresponding labels, targets and additional metadata.


Fit the model agnostic counterfactual generator.


X (ndarray) – Training data array.

Return type



self – The explainer itself.

classmethod load(path, predictor)[source]

Load an explainer from disk.

  • path (Union[str, PathLike]) – Path to a directory containing the saved explainer.

  • predictor (Any) – Model or prediction function used to originally initialize the explainer.

Return type



An explainer instance.


Resets the predictor to be explained.


predictor (Any) – New predictor to be set.

Return type



Save an explainer to disk. Uses the dill module.


path (Union[str, PathLike]) – Path to a directory. A new directory will be created if one does not exist.

Return type


alibi.explainers.cfrl_base.DEFAULT_BASE_PARAMS = {'act_high': 1.0, 'act_low': -1.0, 'act_noise': 0.1, 'actor': None, 'actor_hidden_dim': 256, 'backend': 'tensorflow', 'batch_size': 100, 'callbacks': [], 'conditional_func': <function generate_empty_condition>, 'critic': None, 'critic_hidden_dim': 256, 'decoder_inv_preprocessor': <function identity_function>, 'encoder_preprocessor': <function identity_function>, 'exploration_steps': 100, 'lr_actor': 0.001, 'lr_critic': 0.001, 'num_workers': 4, 'optimizer_actor': None, 'optimizer_critic': None, 'postprocessing_funcs': [], 'replay_buffer_size': 1000, 'reward_func': <function get_classification_reward>, 'shuffle': True, 'train_steps': 100000, 'update_after': 10, 'update_every': 1}

Default Counterfactual with Reinforcement Learning parameters.

  • 'act_noise': float, standard deviation for the normal noise added to the actor for exploration.

  • 'act_low': float, minimum action value. Each action component takes values between [act_low, act_high].

  • 'act_high': float, maximum action value. Each action component takes values between [act_low, act_high].

  • 'replay_buffer_size': int, dimension of the replay buffer in batch_size units. The total memory allocated is proportional with the size * batch_size.

  • 'batch_size': int, training batch size.

  • 'num_workers': int, number of workers used by the data loader if pytorch backend is selected.

  • 'shuffle': bool, whether to shuffle the datasets every epoch.

  • 'exploration_steps': int, number of exploration steps. For the firts exploration_steps, the counterfactual embedding coordinates are sampled uniformly at random from the interval [act_low, act_high].

  • 'update_every': int, number of steps that should elapse between gradient updates. Regardless of the waiting steps, the ratio of waiting steps to gradient steps is locked to 1.

  • 'update_after': int, number of steps to wait before start updating the actor and critic. This ensures that the replay buffers is full enough for useful updates.

  • 'backend': str, backend to be used: tensorflow | pytorch. Default tensorflow.

  • 'train_steps': int, number of train steps.

  • 'encoder_preprocessor': Callable, encoder/autoencoder data preprocessors. Transforms the input data into the format expected by the autoencoder. By default, the identity function.

  • 'decoder_inv_preprocessor': Callable, decoder/autoencoder data inverse preprocessor. Transforms data from the autoencoder output format to the original input format. Before calling the prediction function, the data is inverse preprocessed to match the original input format. By default, the identity function.

  • 'reward_func': Callable, element-wise reward function. By default, considers classification task and checks if the counterfactual prediction label matches the target label. Note that this is element-wise, so a tensor is expected to be returned.

  • 'postprocessing_funcs': List[Postprocessing], list of post-processing functions. The function are applied in the order, from low to high index. Non-differentiable post-processing can be applied. The function expects as arguments X_cf - the counterfactual instance, X - the original input instance and C - the conditional vector, and returns the post-processed counterfactual instance X_cf_pp which is passed as X_cf for the following functions. By default, no post-processing is applied (empty list).

  • 'conditional_func': Callable, generates a conditional vector given a pre-processed input instance. By default, the function returns None which is equivalent to no conditioning.

  • 'callbacks': List[Callback], list of callback functions applied at the end of each training step.

  • 'actor': Optional[Union[tensorflow.keras.Model, torch.nn.Module]], actor network.

  • 'critic;: Optional[Union[tensorflow.keras.Model, torch.nn.Module]], critic network.

  • 'optimizer_actor': Optional[Union[tensorflow.keras.optimizers.Optimizer, torch.optim.Optimizer]], actor optimizer.

  • 'optimizer_critic': Optional[Union[tensorflow.keras.optimizer.Optimizer, torch.optim.Optimizer]], critic optimizer.

  • 'lr_actor': float, actor learning rate.

  • 'lr_critic': float, critic learning rate.

  • 'actor_hidden_dim': int, actor hidden layer dimension.

  • 'critic_hidden_dim': int, critic hidden layer dimension.

class alibi.explainers.cfrl_base.NormalActionNoise(mu, sigma)[source]

Bases: object

Normal noise generator.


Generates normal noise with the appropriate mean and standard deviation.


shape (Tuple[int, …]) – Shape of the tensor to be generated

Return type



Normal noise with the appropriate mean, standard deviation and shape.

__init__(mu, sigma)[source]


  • mu (float) – Mean of the normal noise.

  • sigma (float) – Standard deviation of the noise.

class alibi.explainers.cfrl_base.Postprocessing[source]

Bases: abc.ABC

abstract __call__(X_cf, X, C)[source]

Post-processing function

  • X_cf (Any) – Counterfactual instance. The datatype depends on the output of the decoder. For example, for an image dataset, the output is np.ndarray. For a tabular dataset, the output is List[np.ndarray] where each element of the list corresponds to a feature. This corresponds to the decoder’s output from the heterogeneous autoencoder (see alibi.models.tensorflow.autoencoder.HeAE and alibi.models.pytorch.autoencoder.HeAE).

  • X (ndarray) – Input instance.

  • C (Optional[ndarray]) – Conditional vector. If None, it means that no conditioning was used during training (i.e. the conditional_func returns None).

Return type



X_cf – Post-processed X_cf.

class alibi.explainers.cfrl_base.ReplayBuffer(size=1000)[source]

Bases: object

Circular experience replay buffer for CounterfactualRL (DDPG). When the buffer is filled, then the oldest experience is replaced by the new one (FIFO). The experience batch size is kept constant and inferred when the first batch of data is stored. Allowing flexible batch size can generate Tensorflow warning due to the tf.function retracing, which can lead to a drop in performance.

R_tilde: numpy.ndarray
X: numpy.ndarray
Y_m: numpy.ndarray
Y_t: numpy.ndarray
Z: numpy.ndarray
Z_cf_tilde: numpy.ndarray



size (int) – Dimension of the buffer in batch size. This that the total memory allocated is proportional with the size * batch_size, where batch_size is inferred from the first tensors to be stored.

append(X, Y_m, Y_t, Z, Z_cf_tilde, C, R_tilde, **kwargs)[source]

Adds experience to the replay buffer. When the buffer is filled, then the oldest experience is replaced by the new one (FIFO).

  • X (ndarray) – Input array.

  • Y_m (ndarray) – Model’s prediction class of x.

  • Y_t (ndarray) – Counterfactual target class.

  • Z (ndarray) – Input’s embedding.

  • Z_cf_tilde (ndarray) – Noised counterfactual embedding.

  • C (Optional[ndarray]) – Conditional array.

  • R_tilde (ndarray) – Noised counterfactual reward array.

Return type



Sample a batch of experience form the replay buffer.

Return type

Dict[str, Optional[ndarray]]


A batch experience. For a description of the keys and values returned, see parameter descriptions in alibi.explainers.cfrl_base.ReplayBuffer.append() method. The batch size returned is the same as the one passed in the alibi.explainers.cfrl_base.ReplayBuffer.append().