alibi_detect.cd.sklearn.classifier module

class alibi_detect.cd.sklearn.classifier.ClassifierDriftSklearn(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, use_calibration=False, calibration_kwargs=None, use_oob=False, input_shape=None, data_type=None)[source]

Bases: BaseClassifierDrift

__init__(x_ref, model, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, train_size=0.75, n_folds=None, retrain_from_scratch=True, seed=0, use_calibration=False, calibration_kwargs=None, use_oob=False, input_shape=None, data_type=None)[source]

Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.

Parameters:

x_ref (ndarray) – Data used as reference distribution.
model (ClassifierMixin) – Sklearn classification model used for drift detection.
p_val (float) – p-value used for the significance of the test.
x_ref_preprocessed (bool) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
preprocess_at_init (bool) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.
update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.
preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.
preds_type (str) – Whether the model outputs ‘probs’ or ‘scores’.
binarize_preds (bool) – Whether to test for discrepancy on soft (e.g. probs/scores) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.
train_size (Optional[float]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.
n_folds (Optional[int]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold predictions. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.
retrain_from_scratch (bool) – Whether the classifier should be retrained from scratch for each set of test data or whether it should instead continue training from where it left off on the previous set.
seed (int) – Optional random seed for fold selection.
use_calibration (bool) – Whether to use calibration. Whether to use calibration. Calibration can be used on top of any model.
calibration_kwargs (Optional[dict]) – Optional additional kwargs for calibration. See https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html for more details.
use_oob (bool) – Whether to use out-of-bag(OOB) predictions. Supported only for RandomForestClassifier.
input_shape (Optional[tuple]) – Shape of input data.
data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

score(x)[source]

Compute the out-of-fold drift metric such as the accuracy from a classifier trained to distinguish the reference data from the data to be tested.

Parameters:: x (Union[ndarray, list]) – Batch of instances.
Return type:: Tuple[float, float, ndarray, ndarray, Union[ndarray, list], Union[ndarray, list]]
Returns:: p-value, a notion of distance between the trained classifier’s out-of-fold performance and that which we’d expect under the null assumption of no drift, and the out-of-fold classifier model prediction probabilities on the reference and test data as well as the associated reference and test instances of the out-of-fold predictions.