alibi_detect.cd.ks module

class alibi_detect.cd.ks.KSDrift(x_ref, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', alternative='two-sided', n_features=None, input_shape=None, data_type=None)[source]

Bases: BaseUnivariateDrift

__init__(x_ref, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', alternative='two-sided', n_features=None, input_shape=None, data_type=None)[source]

Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters:

x_ref (Union[ndarray, list]) – Data used as reference distribution.
p_val (float) – p-value used for significance of the K-S test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.
x_ref_preprocessed (bool) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.
preprocess_at_init (bool) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.
update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.
preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.
correction (str) – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).
alternative (str) – Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.
n_features (Optional[int]) – Number of features used in the K-S test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.
input_shape (Optional[tuple]) – Shape of input data.
data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

feature_score(x_ref, x)[source]

Compute K-S scores and statistics per feature.

Parameters:

x_ref (ndarray) – Reference instances to compare distribution with.
x (ndarray) – Batch of instances.

Return type:

Tuple[ndarray, ndarray]

Returns:

Feature level p-values and K-S statistics.