alibi_detect.cd.pytorch.classifier module

class alibi_detect.cd.pytorch.classifier.ClassifierDriftTorch(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTorch.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=torch.optim.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=<class 'alibi_detect.utils.pytorch.data.TorchDataset'>, dataloader=torch.utils.data.DataLoader, input_shape=None, data_type=None)[source]

Bases: BaseClassifierDrift

__init__(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTorch.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=torch.optim.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=<class 'alibi_detect.utils.pytorch.data.TorchDataset'>, dataloader=torch.utils.data.DataLoader, input_shape=None, data_type=None)[source]

Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.

Parameters:
  • x_ref (Union[ndarray, list]) – Data used as reference distribution.

  • model (Union[Module, Sequential]) – PyTorch classification model used for drift detection.

  • p_val (float) – p-value used for the significance of the test.

  • x_ref_preprocessed (bool) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.

  • preprocess_at_init (bool) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • preds_type (str) – Whether the model outputs ‘probs’ or ‘logits’

  • binarize_preds (bool) – Whether to test for discrepency on soft (e.g. probs/logits) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.

  • reg_loss_fn (Callable) – The regularisation term reg_loss_fn(model) is added to the loss function being optimized.

  • train_size (Optional[float]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.

  • n_folds (Optional[int]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold predictions. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.

  • retrain_from_scratch (bool) – Whether the classifier should be retrained from scratch for each set of test data or whether it should instead continue training from where it left off on the previous set.

  • seed (int) – Optional random seed for fold selection.

  • optimizer (Callable) – Optimizer used during training of the classifier.

  • learning_rate (float) – Learning rate used by optimizer.

  • batch_size (int) – Batch size used during training of the classifier.

  • preprocess_batch_fn (Optional[Callable]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.

  • epochs (int) – Number of training epochs for the classifier for each (optional) fold.

  • verbose (int) – Verbosity level during the training of the classifier. 0 is silent, 1 a progress bar.

  • train_kwargs (Optional[dict]) – Optional additional kwargs when fitting the classifier.

  • device (Union[Literal['cuda', 'gpu', 'cpu'], device, None]) – Device type used. The default tries to use the GPU and falls back on CPU if needed. Can be specified by passing either 'cuda', 'gpu', 'cpu' or an instance of torch.device.

  • dataset (Callable) – Dataset object used during training.

  • dataloader (Callable) – Dataloader object used during training.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

score(x)[source]

Compute the out-of-fold drift metric such as the accuracy from a classifier trained to distinguish the reference data from the data to be tested.

Parameters:

x (Union[ndarray, list]) – Batch of instances.

Return type:

Tuple[float, float, ndarray, ndarray, Union[ndarray, list], Union[ndarray, list]]

Returns:

p-value, a notion of distance between the trained classifier’s out-of-fold performance and that which we’d expect under the null assumption of no drift, and the out-of-fold classifier model prediction probabilities on the reference and test data as well as the associated reference and test instances of the out-of-fold predictions.