alibi_detect.cd.pytorch.classifier module

class alibi_detect.cd.pytorch.classifier.ClassifierDriftTorch(x_ref, model, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTorch.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=torch.optim.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=<class 'alibi_detect.utils.pytorch.data.TorchDataset'>, dataloader=torch.utils.data.DataLoader, data_type=None)[source]

Bases: alibi_detect.cd.base.BaseClassifierDrift

__init__(x_ref, model, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTorch.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=torch.optim.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=<class 'alibi_detect.utils.pytorch.data.TorchDataset'>, dataloader=torch.utils.data.DataLoader, data_type=None)[source]

Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.

Parameters
  • x_ref (Union[ndarray, list]) – Data used as reference distribution.

  • model (Union[Module, Sequential]) – PyTorch classification model used for drift detection.

  • p_val (float) – p-value used for the significance of the test.

  • preprocess_x_ref (bool) – Whether to already preprocess and store the reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • preds_type (str) – Whether the model outputs ‘probs’ or ‘logits’

  • binarize_preds (bool) – Whether to test for discrepency on soft (e.g. probs/logits) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.

  • reg_loss_fn (Callable) – The regularisation term reg_loss_fn(model) is added to the loss function being optimized.

  • train_size (Optional[float]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.

  • n_folds (Optional[int]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold predictions. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.

  • retrain_from_scratch (bool) – Whether the classifier should be retrained from scratch for each set of test data or whether it should instead continue training from where it left off on the previous set.

  • seed (int) – Optional random seed for fold selection.

  • optimizer (Callable) – Optimizer used during training of the classifier.

  • learning_rate (float) – Learning rate used by optimizer.

  • batch_size (int) – Batch size used during training of the classifier.

  • preprocess_batch_fn (Optional[Callable]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.

  • epochs (int) – Number of training epochs for the classifier for each (optional) fold.

  • verbose (int) – Verbosity level during the training of the classifier. 0 is silent, 1 a progress bar.

  • train_kwargs (Optional[dict]) – Optional additional kwargs when fitting the classifier.

  • device (Optional[str]) – Device type used. The default None tries to use the GPU and falls back on CPU if needed. Can be specified by passing either ‘cuda’, ‘gpu’ or ‘cpu’.

  • dataset (Callable) – Dataset object used during training.

  • dataloader (Callable) – Dataloader object used during training.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

score(x)[source]

Compute the out-of-fold drift metric such as the accuracy from a classifier trained to distinguish the reference data from the data to be tested.

Parameters

x (Union[ndarray, list]) – Batch of instances.

Return type

Tuple[float, float, ndarray, ndarray]

Returns

  • p-value, a notion of distance between the trained classifier’s out-of-fold performance

  • and that which we’d expect under the null assumption of no drift,

  • and the out-of-fold classifier model prediction probabilities on the reference and test data