alibi_detect.cd.ks module¶
-
class
alibi_detect.cd.ks.
KSDrift
(p_val=0.05, X_ref=None, preprocess_X_ref=True, update_X_ref=None, preprocess_fn=None, preprocess_kwargs=None, correction='bonferroni', alternative='two-sided', n_features=None, n_infer=2, input_shape=None, data_type=None)[source]¶ Bases:
alibi_detect.base.BaseDetector
-
__init__
(p_val=0.05, X_ref=None, preprocess_X_ref=True, update_X_ref=None, preprocess_fn=None, preprocess_kwargs=None, correction='bonferroni', alternative='two-sided', n_features=None, n_infer=2, input_shape=None, data_type=None)[source]¶ Kolmogorov-Smirnov (K-S) data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.
- Parameters
p_val (
float
) – p-value used for significance of the K-S test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.X_ref (
Union
[numpy.ndarray,list
,None
]) – Data used as reference distribution.preprocess_X_ref (
bool
) – Whether to already preprocess and store the reference data.update_X_ref (
Optional
[Dict
[str
,int
]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.preprocess_fn (
Optional
[Callable
]) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.preprocess_kwargs (
Optional
[dict
]) – Kwargs for preprocess_fn.correction (
str
) – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).alternative (
str
) – Defines the alternative hypothesis. Options are ‘two-sided’, ‘less’ or ‘greater’.n_features (
Optional
[int
]) – Number of features used in the K-S test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.n_infer (
int
) – Number of instances used to infer number of features from.data_type (
Optional
[str
]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.
- Return type
None
-
feature_score
(X_ref, X)[source]¶ Compute K-S scores and statistics per feature.
- Parameters
X_ref (numpy.ndarray) – Reference instances to compare distribution with.
X (numpy.ndarray) – Batch of instances.
- Return type
Tuple
[numpy.ndarray, numpy.ndarray]- Returns
Feature level p-values and K-S statistics.
-
predict
(X, drift_type='batch', return_p_val=True, return_distance=True)[source]¶ Predict whether a batch of data has drifted from the reference data.
- Parameters
drift_type (
str
) – Predict drift at the ‘feature’ or ‘batch’ level. For ‘batch’, the K-S statistics for each feature are aggregated using the Bonferroni or False Discovery Rate correction.return_p_val (
bool
) – Whether to return feature level p-values.return_distance (
bool
) – Whether to return the K-S statistic between the features of the new batch and reference data.
- Return type
Dict
[Dict
[str
,str
],Dict
[str
,Union
[numpy.ndarray,int
,float
]]]- Returns
Dictionary containing ‘meta’ and ‘data’ dictionaries.
’meta’ has the model’s metadata.
’data’ contains the drift predictions and both feature and batch level drift scores.
’data’ contains the drift prediction and optionally the feature level p-values, – threshold after multivariate correction if needed and K-S statistics.
-