alibi_detect.cd.classifier module
- class alibi_detect.cd.classifier.ClassifierDrift(x_ref, model, backend='tensorflow', p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDrift.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=None, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=None, dataloader=None, input_shape=None, use_calibration=False, calibration_kwargs=None, use_oob=False, data_type=None)[source]
Bases:
DriftConfigMixin
- __init__(x_ref, model, backend='tensorflow', p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDrift.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=None, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, device=None, dataset=None, dataloader=None, input_shape=None, use_calibration=False, calibration_kwargs=None, use_oob=False, data_type=None)[source]
Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.
- Parameters:
x_ref (
Union
[ndarray
,list
]) – Data used as reference distribution.model (
Union
[ClassifierMixin
,Callable
]) – PyTorch, TensorFlow or Sklearn classification model used for drift detection.backend (
str
) – Backend used for the training loop implementation. Supported: ‘tensorflow’ | ‘pytorch’ | ‘sklearn’.p_val (
float
) – p-value used for the significance of the test.x_ref_preprocessed (
bool
) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.preprocess_at_init (
bool
) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.update_x_ref (
Optional
[Dict
[str
,int
]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.preprocess_fn (
Optional
[Callable
]) – Function to preprocess the data before computing the data drift metrics.preds_type (
str
) – Whether the model outputs ‘probs’ (probabilities - for ‘tensorflow’, ‘pytorch’, ‘sklearn’ models), ‘logits’ (for ‘pytorch’, ‘tensorflow’ models), ‘scores’ (for ‘sklearn’ models if decision_function is supported).binarize_preds (
bool
) – Whether to test for discrepancy on soft (e.g. probs/logits/scores) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.reg_loss_fn (
Callable
) – The regularisation term reg_loss_fn(model) is added to the loss function being optimized. Only relevant for ‘tensorflow` and ‘pytorch’ backends.train_size (
Optional
[float
]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.n_folds (
Optional
[int
]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold instances. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.retrain_from_scratch (
bool
) – Whether the classifier should be retrained from scratch for each set of test data or whether it should instead continue training from where it left off on the previous set.seed (
int
) – Optional random seed for fold selection.optimizer (
Optional
[Callable
]) – Optimizer used during training of the classifier. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.learning_rate (
float
) – Learning rate used by optimizer. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.batch_size (
int
) – Batch size used during training of the classifier. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.preprocess_batch_fn (
Optional
[Callable
]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.epochs (
int
) – Number of training epochs for the classifier for each (optional) fold. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.verbose (
int
) – Verbosity level during the training of the classifier. 0 is silent, 1 a progress bar. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.train_kwargs (
Optional
[dict
]) – Optional additional kwargs when fitting the classifier. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.device (
Union
[Literal
[‘cuda’, ‘gpu’, ‘cpu’], torch.device,None
]) – Device type used. The default tries to use the GPU and falls back on CPU if needed. Can be specified by passing either'cuda'
,'gpu'
,'cpu'
or an instance oftorch.device
. Only relevant for ‘pytorch’ backend.dataset (
Optional
[Callable
]) – Dataset object used during training. Only relevant for ‘tensorflow’ and ‘pytorch’ backends.dataloader (
Optional
[Callable
]) – Dataloader object used during training. Only relevant for ‘pytorch’ backend.use_calibration (
bool
) – Whether to use calibration. Calibration can be used on top of any model. Only relevant for ‘sklearn’ backend.calibration_kwargs (
Optional
[dict
]) – Optional additional kwargs for calibration. Only relevant for ‘sklearn’ backend. See https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html for more details.use_oob (
bool
) – Whether to use out-of-bag(OOB) predictions. Supported only for RandomForestClassifier.data_type (
Optional
[str
]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.
- predict(x, return_p_val=True, return_distance=True, return_probs=True, return_model=True)[source]
Predict whether a batch of data has drifted from the reference data.
- Parameters:
return_p_val (
bool
) – Whether to return the p-value of the test.return_distance (
bool
) – Whether to return a notion of strength of the drift. K-S test stat if binarize_preds=False, otherwise relative error reduction.return_probs (
bool
) – Whether to return the instance level classifier probabilities for the reference and test data (0=reference data, 1=test data).return_model (
bool
) – Whether to return the updated model trained to discriminate reference and test instances.
- Return type:
- Returns:
Dictionary containing
'meta'
and'data'
dictionaries –'meta'
- has the model’s metadata.'data'
- contains the drift prediction and optionally the p-value, performance of the classifier relative to its expectation under the no-change null, the out-of-fold classifier model prediction probabilities on the reference and test data, and the trained model.