alibi_detect.cd.tensorflow.classifier module

class alibi_detect.cd.tensorflow.classifier.ClassifierDriftTF(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTF.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=tensorflow.keras.optimizers.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, dataset=<class 'alibi_detect.utils.tensorflow.data.TFDataset'>, input_shape=None, data_type=None)[source]

Bases: BaseClassifierDrift

__init__(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, reg_loss_fn=<function ClassifierDriftTF.<lambda>>, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, optimizer=tensorflow.keras.optimizers.Adam, learning_rate=0.001, batch_size=32, preprocess_batch_fn=None, epochs=3, verbose=0, train_kwargs=None, dataset=<class 'alibi_detect.utils.tensorflow.data.TFDataset'>, input_shape=None, data_type=None)[source]

Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.

Parameters:

x_ref (ndarray) – Data used as reference distribution.
model (Model) – TensorFlow classification model used for drift detection.
p_val (float) – p-value used for the significance of the test.
x_ref_preprocessed (bool) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
preprocess_at_init (bool) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.
update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.
preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.
preds_type (str) – Whether the model outputs ‘probs’ or ‘logits’.
binarize_preds (bool) – Whether to test for discrepency on soft (e.g. prob/log-prob) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.
reg_loss_fn (Callable) – The regularisation term reg_loss_fn(model) is added to the loss function being optimized.
train_size (Optional[float]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.
n_folds (Optional[int]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold predictions. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.
retrain_from_scratch (bool) – Whether the classifier should be retrained from scratch for each set of test data or whether it should instead continue training from where it left off on the previous set.
seed (int) – Optional random seed for fold selection.
optimizer (Union[Optimizer, Optimizer, Type[Optimizer], Type[Optimizer]]) – Optimizer used during training of the classifier.
learning_rate (float) – Learning rate used by optimizer.
batch_size (int) – Batch size used during training of the classifier.
preprocess_batch_fn (Optional[Callable]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.
epochs (int) – Number of training epochs for the classifier for each (optional) fold.
verbose (int) – Verbosity level during the training of the classifier. 0 is silent, 1 a progress bar and 2 prints the statistics after each epoch.
train_kwargs (Optional[dict]) – Optional additional kwargs when fitting the classifier.
dataset (Callable) – Dataset object used during training.
input_shape (Optional[tuple]) – Shape of input data.
data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

score(x)[source]

Compute the out-of-fold drift metric such as the accuracy from a classifier trained to distinguish the reference data from the data to be tested.

Parameters:: x (ndarray) – Batch of instances.
Return type:: Tuple[float, float, ndarray, ndarray, Union[ndarray, list], Union[ndarray, list]]
Returns:: p-value, a notion of distance between the trained classifier’s out-of-fold performance and that which we’d expect under the null assumption of no drift, and the out-of-fold classifier model prediction probabilities on the reference and test data as well as the associated reference and test instances of the out-of-fold predictions.