alibi_detect.cd.base module

class alibi_detect.cd.base.BaseClassifierDrift(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, train_size=0.75, n_folds=None, seed=0, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector

__init__(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, preds_type='probs', binarize_preds=False, train_size=0.75, n_folds=None, seed=0, data_type=None)[source]

Base class for the classifier-based drift detector.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • p_val (float) – p-value used for the significance of the test.

  • preprocess_x_ref (bool) – Whether to already preprocess and store the reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • preds_type (str) – Whether the model outputs probabilities or logits

  • binarize_preds (bool) – Whether to test for discrepency on soft (e.g. probs/logits) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.

  • train_size (Optional[float]) – Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on 1 - train_size. Cannot be used in combination with n_folds.

  • n_folds (Optional[int]) – Optional number of stratified folds used for training. The model preds are then calculated on all the out-of-fold predictions. This allows to leverage all the reference and test data for drift detection at the expense of longer computation. If both train_size and n_folds are specified, n_folds is prioritized.

  • seed (int) – Optional random seed for fold selection.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

get_splits(x_ref, x)[source]

Split reference and test data in train and test folds used by the classifier.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • x (ndarray) – Batch of instances.

Return type

Tuple[ndarray, ndarray, List[Tuple[ndarray, ndarray]]]

Returns

List with tuples of train and test indices for optionally different folds.

predict(x, return_p_val=True, return_distance=True)[source]

Predict whether a batch of data has drifted from the reference data.

Parameters
  • x (ndarray) – Batch of instances.

  • return_p_val (bool) – Whether to return the p-value of the test.

  • return_distance (bool) – Whether to return a notion of strength of the drift. K-S test stat if binarize_preds=False, otherwise relative error reduction.

Return type

Dict[Dict[str, str], Dict[str, Union[int, float]]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the drift prediction and optionally the performance of the classifier – relative to its expectation under the no-change null.

preprocess(x)[source]

Data preprocessing before computing the drift scores. :type x: ndarray :param x: Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Preprocessed reference data and new instances.

abstract score(x)[source]
Return type

Tuple[float, float]

test_probs(y_oof, probs_oof, n_ref, n_cur)[source]

Perform a statistical test of the probabilities predicted by the model against what we’d expect under the no-change null.

Parameters
  • y_oof (ndarray) – Out of fold targets (0 ref, 1 cur)

  • probs_oof (ndarray) – Probabilities predicted by the model

  • n_ref (int) – Size of reference window used in training model

  • n_cur (int) – Size of current window used in trianing model

Return type

Tuple[float, float]

Returns

p-value and notion of performance of classifier relative to expectation under null

class alibi_detect.cd.base.BaseLSDDDrift(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, sigma=None, n_permutations=100, n_kernel_centers=None, lambda_rd_max=0.2, input_shape=None, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector

__init__(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, sigma=None, n_permutations=100, n_kernel_centers=None, lambda_rd_max=0.2, input_shape=None, data_type=None)[source]

Least-squares Density Difference (LSDD) base data drift detector using a permutation test.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • p_val (float) – p-value used for the significance of the permutation test.

  • preprocess_x_ref (bool) – Whether to already preprocess and store the reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • sigma (Optional[ndarray]) – Optionally set the bandwidth of the Gaussian kernel used in estimating the LSDD. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths. If sigma is not specified, the ‘median heuristic’ is adopted whereby sigma is set as the median pairwise distance between reference samples.

  • n_permutations (int) – Number of permutations used in the permutation test.

  • n_kernel_centers (Optional[int]) – The number of reference samples to use as centers in the Gaussian kernel model used to estimate LSDD. Defaults to 1/20th of the reference data.

  • lambda_rd_max (float) – The maximum relative difference between two estimates of LSDD that the regularization parameter lambda is allowed to cause. Defaults to 0.2 as in the paper.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

predict(x, return_p_val=True, return_distance=True)[source]

Predict whether a batch of data has drifted from the reference data.

Parameters
  • x (ndarray) – Batch of instances.

  • return_p_val (bool) – Whether to return the p-value of the permutation test.

  • return_distance (bool) – Whether to return the LSDD metric between the new batch and reference data.

Return type

Dict[Dict[str, str], Dict[str, Union[int, float]]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the drift prediction and optionally the p-value, threshold and LSDD metric.

preprocess(x)[source]

Data preprocessing before computing the drift scores. :type x: ndarray :param x: Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Preprocessed reference data and new instances.

abstract score(x)[source]
Return type

Tuple[float, float, ndarray]

class alibi_detect.cd.base.BaseMMDDrift(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, input_shape=None, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector

__init__(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, input_shape=None, data_type=None)[source]

Maximum Mean Discrepancy (MMD) base data drift detector using a permutation test.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • p_val (float) – p-value used for the significance of the permutation test.

  • preprocess_x_ref (bool) – Whether to already preprocess and store the reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • sigma (Optional[ndarray]) – Optionally set the Gaussian RBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.

  • configure_kernel_from_x_ref (bool) – Whether to already configure the kernel bandwidth from the reference data.

  • n_permutations (int) – Number of permutations used in the permutation test.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

abstract kernel_matrix(x, y)[source]
Return type

Union[Tensor, Tensor]

predict(x, return_p_val=True, return_distance=True)[source]

Predict whether a batch of data has drifted from the reference data.

Parameters
  • x (ndarray) – Batch of instances.

  • return_p_val (bool) – Whether to return the p-value of the permutation test.

  • return_distance (bool) – Whether to return the MMD metric between the new batch and reference data.

Return type

Dict[Dict[str, str], Dict[str, Union[int, float]]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the drift prediction and optionally the p-value, threshold and MMD metric.

preprocess(x)[source]

Data preprocessing before computing the drift scores. :type x: ndarray :param x: Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Preprocessed reference data and new instances.

abstract score(x)[source]
Return type

Tuple[float, float, ndarray]

class alibi_detect.cd.base.BaseUnivariateDrift(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', n_features=None, input_shape=None, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector

__init__(x_ref, p_val=0.05, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', n_features=None, input_shape=None, data_type=None)[source]

Generic drift detector component which serves as a base class for methods using univariate tests with multivariate correction.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • p_val (float) – p-value used for significance of the statistical test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

  • preprocess_x_ref (bool) – Whether to already preprocess and store the reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

  • correction (str) – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

  • n_features (Optional[int]) – Number of features used in the statistical test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

  • input_shape (Optional[tuple]) – Shape of input data. Needs to be provided for text data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

abstract feature_score(x_ref, x)[source]
Return type

Tuple[ndarray, ndarray]

predict(x, drift_type='batch', return_p_val=True, return_distance=True)[source]

Predict whether a batch of data has drifted from the reference data.

Parameters
  • x (ndarray) – Batch of instances.

  • drift_type (str) – Predict drift at the ‘feature’ or ‘batch’ level. For ‘batch’, the test statistics for each feature are aggregated using the Bonferroni or False Discovery Rate correction.

  • return_p_val (bool) – Whether to return feature level p-values.

  • return_distance (bool) – Whether to return the test statistic between the features of the new batch and reference data.

Return type

Dict[Dict[str, str], Dict[str, Union[ndarray, int, float]]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the drift prediction and optionally the feature level p-values, – threshold after multivariate correction if needed and test statistics.

preprocess(x)[source]

Data preprocessing before computing the drift scores.

Parameters

x (ndarray) – Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Preprocessed reference data and new instances.

score(x)[source]

Compute the feature-wise drift score which is the p-value of the statistical test and the test statistic.

Parameters

x (ndarray) – Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Feature level p-values and test statistics.