alibi_detect.cd.chisquare module

class alibi_detect.cd.chisquare.ChiSquareDrift(x_ref, p_val=0.05, categories_per_feature=None, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', n_features=None, input_shape=None, data_type=None)[source]

Bases: alibi_detect.cd.base.BaseUnivariateDrift

__init__(x_ref, p_val=0.05, categories_per_feature=None, preprocess_x_ref=True, update_x_ref=None, preprocess_fn=None, correction='bonferroni', n_features=None, input_shape=None, data_type=None)[source]

Chi-Squared data drift detector with Bonferroni or False Discovery Rate (FDR) correction for multivariate data.

Parameters
  • x_ref (ndarray) – Data used as reference distribution.

  • p_val (float) – p-value used for significance of the Chi-Squared test for each feature. If the FDR correction method is used, this corresponds to the acceptable q-value.

  • categories_per_feature (Optional[Dict[int, int]]) – Optional dictionary with as keys the feature column index and as values the number of possible categorical values for that feature or a list with the possible values. If you know how many categories are present for a given feature you could pass this in the categories_per_feature dict in the Dict[int, int] format, e.g. {0: 3, 3: 2}. If you pass N categories this will assume the possible values for the feature are [0, …, N-1]. You can also explicitly pass the possible categories in the Dict[int, List[int]] format, e.g. {0: [0, 1, 2], 3: [0, 55]}. Note that the categories can be arbitrary int values. If it is not specified, categories_per_feature is inferred from x_ref.

  • preprocess_x_ref (bool) – Whether to already preprocess and infer categories and frequencies for reference data.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics. Typically a dimensionality reduction technique.

  • correction (str) – Correction type for multivariate data. Either ‘bonferroni’ or ‘fdr’ (False Discovery Rate).

  • n_features (Optional[int]) – Number of features used in the Chi-Squared test. No need to pass it if no preprocessing takes place. In case of a preprocessing step, this can also be inferred automatically but could be more expensive to compute.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

feature_score(x_ref, x)[source]

Compute Chi-Squared test statistic and p-values per feature.

Parameters
  • x_ref (ndarray) – Reference instances to compare distribution with.

  • x (ndarray) – Batch of instances.

Return type

Tuple[ndarray, ndarray]

Returns

Feature level p-values and Chi-Squared statistics.