alibi.explainers.permutation_importance module

class alibi.explainers.permutation_importance.Kind(value)[source]

Bases: str, Enum

Enumeration of supported kind.

DIFFERENCE = 'difference'
RATIO = 'ratio'
alibi.explainers.permutation_importance.LOSS_FNS = {'log_loss': sklearn.metrics.log_loss, 'mean_absolute_error': sklearn.metrics.mean_absolute_error, 'mean_absolute_percentage_error': sklearn.metrics.mean_absolute_percentage_error, 'mean_squared_error': sklearn.metrics.mean_squared_error, 'mean_squared_log_error': sklearn.metrics.mean_squared_log_error}

Dictionary of supported string specified loss functions

class alibi.explainers.permutation_importance.Method(value)[source]

Bases: str, Enum

Enumeration of supported method.

ESTIMATE = 'estimate'
EXACT = 'exact'
class alibi.explainers.permutation_importance.PermutationImportance(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]

Bases: Explainer

Implementation of the permutation feature importance for tabular datasets. The method measure the importance of a feature as the relative increase/decrease in the loss/score function when the feature values are permuted. Supports black-box models.

For details of the method see the papers:

__init__(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]

Initialize the permutation feature importance.

Parameters:
  • predictor (Callable[[ndarray], ndarray]) – A prediction function which receives as input a numpy array of size N x F, and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features, and T is the number of targets. Note that the output shape must be compatible with the loss and score functions provided in loss_fns and score_fns.

  • loss_fns (Union[Literal[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’], List[Literal[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’]], Callable[[ndarray, ndarray, Optional[ndarray]], float], Dict[str, Callable[[ndarray, ndarray, Optional[ndarray]], float]], None]) –

    A literal, or a list of literals, or a loss function, or a dictionary of loss functions having as keys the names of the loss functions and as values the loss functions (i.e., lower values are better). The available literal values are described in alibi.explainers.permutation_importance.LOSS_FNS. Note that the predictor output must be compatible with every loss function. Every loss function is expected to receive the following arguments:

    • y_true : np.ndarray - a numpy array of ground-truth labels.

    • y_pred | y_score : np.ndarray - a numpy array of model predictions. This corresponds to the output of the model.

    • sample_weight: Optional[np.ndarray] - a numpy array of sample weights.

  • score_fns (Union[Literal[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’], List[Literal[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’]], Callable[[ndarray, ndarray, Optional[ndarray]], float], Dict[str, Callable[[ndarray, ndarray, Optional[ndarray]], float]], None]) – A literal, or a list or literals, or a score function, or a dictionary of score functions having as keys the names of the score functions and as values the score functions (i.e, higher values are better). The available literal values are described in alibi.explainers.permutation_importance.SCORE_FNS. As with the loss_fns, the predictor output must be compatible with every score function and the score function must have the same signature presented in the loss_fns parameter description.

  • feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

  • verbose (bool) – Whether to print the progress of the explainer.

explain(X, y, features=None, method='estimate', kind='ratio', n_repeats=50, sample_weight=None)[source]

Computes the permutation feature importance for each feature with respect to the given loss or score functions and the dataset (X, y).

Parameters:
  • X (ndarray) – A N x F input feature dataset used to calculate the permutation feature importance. This is typically the test dataset.

  • y (ndarray) – Ground-truth labels array of size N (i.e. (N, )) corresponding the input feature X.

  • features (Optional[List[Union[int, Tuple[int, ...]]]]) – An optional list of features or tuples of features for which to compute the permutation feature importance. If not provided, the permutation feature importance will be computed for every single features in the dataset. Some example of features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

  • method (Literal[‘estimate’, ‘exact’]) – The method to be used to compute the feature importance. If set to 'exact', a “switch” operation is performed across all observed pairs, by excluding pairings that are actually observed in the original dataset. This operation is quadratic in the number of samples (N x (N - 1) samples) and thus can be computationally intensive. If set to 'estimate', the dataset will be divided in half. The values of the first half containing the ground-truth labels the rest of the features (i.e. features that are left intact) is matched with the values of the second half of the permuted features, and the other way around. This method is computationally lighter and provides estimate error bars given by the standard deviation. Note that for some specific loss and score functions, the estimate does not converge to the exact metric value.

  • kind (Literal[‘ratio’, ‘difference’]) – Whether to report the importance as the loss/score ratio or the loss/score difference. Available values are: 'ratio' | 'difference'.

  • n_repeats (int) – Number of times to permute the feature values. Considered only when method='estimate'.

  • sample_weight (Optional[ndarray]) – Optional weight for each sample instance.

Return type:

Explanation

Returns:

explanation – An Explanation object containing the data and the metadata of the permutation feature importance. See usage at Permutation feature importance examples for details

reset_predictor(predictor)[source]

Resets the predictor function.

Parameters:

predictor (Callable) – New predictor function.

Return type:

None

alibi.explainers.permutation_importance.SCORE_FNS = {'accuracy': sklearn.metrics.accuracy_score, 'f1': sklearn.metrics.f1_score, 'precision': sklearn.metrics.precision_score, 'r2': sklearn.metrics.r2_score, 'recall': sklearn.metrics.recall_score, 'roc_auc': sklearn.metrics.roc_auc_score}

Dictionary of supported string specified score functions

alibi.explainers.permutation_importance.plot_permutation_importance(exp, features='all', metric_names='all', n_cols=3, sort=True, top_k=None, ax=None, bar_kw=None, fig_kw=None)[source]

Plot permutation feature importance on matplotlib axes.

Parameters:
  • exp – An Explanation object produced by a call to the alibi.explainers.permutation_importance.PermutationImportance.explain() method.

  • features – A list of feature entries provided in feature_names argument to the alibi.explainers.permutation_importance.PermutationImportance.explain() method, or 'all' to plot all the explained features. For example, consider that the feature_names = ['temp', 'hum', 'windspeed', 'season']. If we set features=None in the explain method, meaning that all the feature were explained, and we want to plot only the values for the 'temp' and 'windspeed', then we would set features=[0, 2]. Otherwise, if we set features=[1, 2, 3] in the explain method, meaning that we explained ['hum', 'windspeed', 'season'], and we want to plot the values only for ['windspeed', 'season'], then we would set features=[1, 2] (i.e., their index in the features list passed to the explain method). Defaults to 'all'.

  • metric_names – A list of metric entries in the exp.data[‘metrics’] to plot the permutation feature importance for, or 'all' to plot the permutation feature importance for all metrics (i.e., loss and score functions). The ordering is given by the concatenation of the loss metrics followed by the score metrics.

  • n_cols – Number of columns to organize the resulting plot into.

  • sort – Boolean flag whether to sort the values in descending order.

  • top_k – Number of top k values to be displayed if the sort=True. If not provided, then all values will be displayed.

  • ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

  • bar_kw – Keyword arguments passed to the matplotlib.pyplot.barh function.

  • fig_kw

    Keyword arguments passed to the matplotlib.figure.set function.

Returns:

plt.Axes with the feature importance plot.