alibi.utils.distance module

alibi.utils.distance.abdm(X, cat_vars, cat_vars_bin={})[source]

Calculate the pair-wise distances between categories of a categorical variable using the Association-Based Distance Metric based on Le et al (2005). http://www.jaist.ac.jp/~bao/papers/N26.pdf

Parameters:
  • X (ndarray) – Batch of arrays.

  • cat_vars (dict) – Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.

  • cat_vars_bin (dict) – Dict with as keys the binned numerical columns and as optional values the number of bins per variable.

Returns:

Dict with as keys the categorical columns and as values the pairwise distance matrix for the variable.

alibi.utils.distance.batch_compute_kernel_matrix(x, y, kernel, batch_size=10000000000, preprocess_fn=None)[source]

Compute the kernel matrix between x and y by filling in blocks of size batch_size x batch_size at a time.

Parameters:
  • x (Union[list, ndarray]) – The first list/numpy array of data instances.

  • y (Union[list, ndarray]) – The second list/numpy array of data instances.

  • kernel (Callable[[ndarray, ndarray], ndarray]) – Kernel function to be used for kernel matrix computation.

  • batch_size (int) – Batch size to be used for each prediction.

  • preprocess_fn (Optional[Callable[[Union[list, ndarray]], ndarray]]) – Optional preprocessing function for each batch.

Return type:

ndarray

Returns:

Kernel matrix in the form of a numpy array.

alibi.utils.distance.cityblock_batch(X, y)[source]

Calculate the L1 distances between a batch of arrays X and an array of the same shape y.

Parameters:
  • X (ndarray) – Batch of arrays to calculate the distances from.

  • y (ndarray) – Array to calculate the distance to.

Return type:

ndarray

Returns:

Array of distances from each array in X to y.

alibi.utils.distance.multidim_scaling(d_pair, feature_range, n_components=2, use_metric=True, standardize_cat_vars=True, smooth=1.0, center=True, update_feature_range=True)[source]

Apply multidimensional scaling to pairwise distance matrices.

Parameters:
  • d_pair (dict) – Dict with as keys the column index of the categorical variables and as values a pairwise distance matrix for the categories of the variable.

  • feature_range (Tuple[ndarray, ndarray]) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges are numpy arrays with dimension (1 x nb of features).

  • n_components (int) – Number of dimensions in which to immerse the dissimilarities.

  • use_metric (bool) – If True, perform metric MDS; otherwise, perform nonmetric MDS.

  • standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

  • smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values than 1 will smooth the difference in distance metric between different features.

  • center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

  • update_feature_range (bool) – Update feature range with scaled values.

Return type:

Tuple[dict, tuple]

Returns:

Dict with multidimensional scaled version of pairwise distance matrices.

alibi.utils.distance.mvdm(X, y, cat_vars, alpha=1)[source]

Calculate the pair-wise distances between categories of a categorical variable using the Modified Value Difference Measure based on Cost et al (1993). https://link.springer.com/article/10.1023/A:1022664626993

Parameters:
  • X (ndarray) – Batch of arrays.

  • y (ndarray) – Batch of labels or predictions.

  • cat_vars (dict) – Dict with as keys the categorical columns and as optional values the number of categories per categorical variable.

  • alpha (int) – Power of absolute difference between conditional probabilities.

Return type:

Dict[int, ndarray]

Returns:

Dict with as keys the categorical columns and as values the pairwise distance matrix for the variable.

alibi.utils.distance.squared_pairwise_distance(x, y, a_min=1e-07, a_max=1e+30)[source]

numpy pairwise squared Euclidean distance between samples x and y.

Parameters:
  • x (ndarray) – A batch of instances of shape Nx x features.

  • y (ndarray) – A batch of instances of shape Ny x features.

  • a_min (float) – Lower bound to clip distance values.

  • a_max (float) – Upper bound to clip distance values.

Return type:

ndarray

Returns:

Pairwise squared Euclidean distance Nx x Ny.