alibi_detect.cd.tensorflow.mmd module

class alibi_detect.cd.tensorflow.mmd.MMDDriftTF(x_ref, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, kernel=<class 'alibi_detect.utils.tensorflow.kernels.GaussianRBF'>, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, input_shape=None, data_type=None)[source]

Bases: BaseMMDDrift

__init__(x_ref, p_val=0.05, x_ref_preprocessed=False, preprocess_at_init=True, update_x_ref=None, preprocess_fn=None, kernel=<class 'alibi_detect.utils.tensorflow.kernels.GaussianRBF'>, sigma=None, configure_kernel_from_x_ref=True, n_permutations=100, input_shape=None, data_type=None)[source]

Maximum Mean Discrepancy (MMD) data drift detector using a permutation test.

Parameters:
  • x_ref (Union[ndarray, list]) – Data used as reference distribution.

  • p_val (float) – p-value used for the significance of the permutation test.

  • x_ref_preprocessed (bool) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.

  • preprocess_at_init (bool) – Whether to preprocess the reference data when the detector is instantiated. Otherwise, the reference data will be preprocessed at prediction time. Only applies if x_ref_preprocessed=False.

  • update_x_ref (Optional[Dict[str, int]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.

  • preprocess_fn (Optional[Callable]) – Function to preprocess the data before computing the data drift metrics.

  • kernel (Callable) – Kernel used for the MMD computation, defaults to Gaussian RBF kernel.

  • sigma (Optional[ndarray]) – Optionally set the GaussianRBF kernel bandwidth. Can also pass multiple bandwidth values as an array. The kernel evaluation is then averaged over those bandwidths.

  • configure_kernel_from_x_ref (bool) – Whether to already configure the kernel bandwidth from the reference data.

  • n_permutations (int) – Number of permutations used in the permutation test.

  • input_shape (Optional[tuple]) – Shape of input data.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

kernel_matrix(x, y)[source]

Compute and return full kernel matrix between arrays x and y.

Return type:

Tensor

score(x)[source]

Compute the p-value resulting from a permutation test using the maximum mean discrepancy as a distance measure between the reference data and the data to be tested.

Parameters:

x (Union[ndarray, list]) – Batch of instances.

Return type:

Tuple[float, float, float]

Returns:

p-value obtained from the permutation test, the MMD^2 between the reference and test set, and the MMD^2 threshold above which drift is flagged.