alibi_detect.cd.ks module

class alibi_detect.cd.ks.KSDrift(p_val=0.05, X_ref=None, preprocess_X_ref=True, update_X_ref=None, preprocess_fn=None, preprocess_kwargs=None, correction='bonferroni', alternative='two-sided', n_features=None, n_infer=2, input_shape=None, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector

__init__(p_val=0.05, X_ref=None, preprocess_X_ref=True, update_X_ref=None, preprocess_fn=None, preprocess_kwargs=None, correction='bonferroni', alternative='two-sided', n_features=None, n_infer=2, input_shape=None, data_type=None)[source]

Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters
  • p_val (float) – p-value used for significance of the K-S test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

  • X_ref (Union[numpy.ndarray, list, None]) – Data used as reference distribution.

  • preprocess_X_ref (bool) – Whether to already preprocess and store the reference data.

  • update_X_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

  • preprocess_kwargs (Optional[dict]) – Kwargs for preprocess_fn.

  • correction (str) – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

  • alternative (str) – Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.

  • n_features (Optional[int]) – Number of features used in the K-S test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

  • n_infer (int) – Number of instances used to infer number of features from.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

feature_score(X_ref, X)[source]

Compute K-S scores and statistics per feature.

Parameters
  • X_ref (numpy.ndarray) – Reference instances to compare distribution with.

  • X (numpy.ndarray) – Batch of instances.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

Returns

Feature level p-values and K-S statistics.

predict(X, drift_type='batch', return_p_val=True, return_distance=True)[source]

Predict whether a batch of data has drifted from the reference data.

Parameters
  • X (Union[numpy.ndarray, list]) – Batch of instances.

  • drift_type (str) – Predict drift at the ‘feature’ or ‘batch’ level. For ‘batch’, the K-S statistics for each feature are aggregated using the Bonferroni or False Discovery Rate correction.

  • return_p_val (bool) – Whether to return feature level p-values.

  • return_distance (bool) – Whether to return the K-S statistic between the features of the new batch and reference data.

Return type

Dict[Dict[str, str], Dict[str, Union[numpy.ndarray, int, float]]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the drift predictions and both feature and batch level drift scores.

  • ’data’ contains the drift prediction and optionally the feature level p-values, – threshold after multivariate correction if needed and K-S statistics.

preprocess(X)[source]

Data preprocessing before computing the drift scores.

Parameters

X (Union[numpy.ndarray, list]) – Batch of instances.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

Returns

Preprocessed reference data and new instances.

score(X)[source]

Compute the feature-wise drift score which is the p-value of the Kolmogorov-Smirnov test and the test statistic.

Parameters

X (numpy.ndarray) – Batch of instances.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

Returns

Feature level p-values and K-S statistics.