alibi.explainers.permutation_importance module
- class alibi.explainers.permutation_importance.Kind(value)[source]
-
Enumeration of supported kind.
- DIFFERENCE = 'difference'
- RATIO = 'ratio'
- alibi.explainers.permutation_importance.LOSS_FNS = {'log_loss': sklearn.metrics.log_loss, 'mean_absolute_error': sklearn.metrics.mean_absolute_error, 'mean_absolute_percentage_error': sklearn.metrics.mean_absolute_percentage_error, 'mean_squared_error': sklearn.metrics.mean_squared_error, 'mean_squared_log_error': sklearn.metrics.mean_squared_log_error}
Dictionary of supported string specified loss functions
'mean_absolute_error'
- Mean absolute error regression loss. See sklearn.metrics.mean_absolute_error for documentation.'mean_squared_error'
- Mean squared error regression loss. See sklearn.metrics.mean_squared_error for documentation.'mean_squared_log_error'
- Mean squared logarithmic error regression loss. See sklearn.metrics.mean_squared_log_error for documentation.'mean_absolute_percentage_error'
- Mean absolute percentage error (MAPE) regression loss. See sklearn.metrics.mean_absolute_percentage_error for documentation.'log_loss'
- Log loss, aka logistic loss or cross-entropy loss. See sklearn.metrics.log_loss for documentation.
- class alibi.explainers.permutation_importance.Method(value)[source]
-
Enumeration of supported method.
- ESTIMATE = 'estimate'
- EXACT = 'exact'
- class alibi.explainers.permutation_importance.PermutationImportance(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]
Bases:
Explainer
Implementation of the permutation feature importance for tabular datasets. The method measure the importance of a feature as the relative increase/decrease in the loss/score function when the feature values are permuted. Supports black-box models.
For details of the method see the papers:
- __init__(predictor, loss_fns=None, score_fns=None, feature_names=None, verbose=False)[source]
Initialize the permutation feature importance.
- Parameters:
predictor (
Callable
[[ndarray
],ndarray
]) – A prediction function which receives as input a numpy array of size N x F, and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features, and T is the number of targets. Note that the output shape must be compatible with the loss and score functions provided in loss_fns and score_fns.loss_fns (
Union
[Literal
[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’],List
[Literal
[‘mean_absolute_error’, ‘mean_squared_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘log_loss’]],Callable
[[ndarray
,ndarray
,Optional
[ndarray
]],float
],Dict
[str
,Callable
[[ndarray
,ndarray
,Optional
[ndarray
]],float
]],None
]) –A literal, or a list of literals, or a loss function, or a dictionary of loss functions having as keys the names of the loss functions and as values the loss functions (i.e., lower values are better). The available literal values are described in
alibi.explainers.permutation_importance.LOSS_FNS
. Note that the predictor output must be compatible with every loss function. Every loss function is expected to receive the following arguments:y_true :
np.ndarray
- a numpy array of ground-truth labels.y_pred | y_score :
np.ndarray
- a numpy array of model predictions. This corresponds to the output of the model.sample_weight:
Optional[np.ndarray]
- a numpy array of sample weights.
score_fns (
Union
[Literal
[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’],List
[Literal
[‘accuracy’, ‘precision’, ‘recall’, ‘f1’, ‘roc_auc’, ‘r2’]],Callable
[[ndarray
,ndarray
,Optional
[ndarray
]],float
],Dict
[str
,Callable
[[ndarray
,ndarray
,Optional
[ndarray
]],float
]],None
]) – A literal, or a list or literals, or a score function, or a dictionary of score functions having as keys the names of the score functions and as values the score functions (i.e, higher values are better). The available literal values are described inalibi.explainers.permutation_importance.SCORE_FNS
. As with the loss_fns, the predictor output must be compatible with every score function and the score function must have the same signature presented in the loss_fns parameter description.feature_names (
Optional
[List
[str
]]) – A list of feature names used for displaying results.verbose (
bool
) – Whether to print the progress of the explainer.
- explain(X, y, features=None, method='estimate', kind='ratio', n_repeats=50, sample_weight=None)[source]
Computes the permutation feature importance for each feature with respect to the given loss or score functions and the dataset (X, y).
- Parameters:
X (
ndarray
) – A N x F input feature dataset used to calculate the permutation feature importance. This is typically the test dataset.y (
ndarray
) – Ground-truth labels array of size N (i.e. (N, )) corresponding the input feature X.features (
Optional
[List
[Union
[int
,Tuple
[int
,...
]]]]) – An optional list of features or tuples of features for which to compute the permutation feature importance. If not provided, the permutation feature importance will be computed for every single features in the dataset. Some example of features would be:[0, 2]
,[0, 2, (0, 2)]
,[(0, 2)]
, where0
and2
correspond to column 0 and 2 in X, respectively.method (
Literal
[‘estimate’, ‘exact’]) – The method to be used to compute the feature importance. If set to'exact'
, a “switch” operation is performed across all observed pairs, by excluding pairings that are actually observed in the original dataset. This operation is quadratic in the number of samples (N x (N - 1) samples) and thus can be computationally intensive. If set to'estimate'
, the dataset will be divided in half. The values of the first half containing the ground-truth labels the rest of the features (i.e. features that are left intact) is matched with the values of the second half of the permuted features, and the other way around. This method is computationally lighter and provides estimate error bars given by the standard deviation. Note that for some specific loss and score functions, the estimate does not converge to the exact metric value.kind (
Literal
[‘ratio’, ‘difference’]) – Whether to report the importance as the loss/score ratio or the loss/score difference. Available values are:'ratio'
|'difference'
.n_repeats (
int
) – Number of times to permute the feature values. Considered only whenmethod='estimate'
.sample_weight (
Optional
[ndarray
]) – Optional weight for each sample instance.
- Return type:
- Returns:
explanation – An Explanation object containing the data and the metadata of the permutation feature importance. See usage at Permutation feature importance examples for details
- alibi.explainers.permutation_importance.SCORE_FNS = {'accuracy': sklearn.metrics.accuracy_score, 'f1': sklearn.metrics.f1_score, 'precision': sklearn.metrics.precision_score, 'r2': sklearn.metrics.r2_score, 'recall': sklearn.metrics.recall_score, 'roc_auc': sklearn.metrics.roc_auc_score}
Dictionary of supported string specified score functions
'accuracy'
- Accuracy classification score. See sklearn.metrics.accuracy_score for documentation.'precision'
- Precision score. See sklearn.metrics.precision_score for documentation.'recall'
- Recall score. See sklearn.metrics.recall_score for documentation.'f1_score'
- F1 score. See sklearn.metrics.f1_score for documentation.'roc_auc_score'
- Area Under the Receiver Operating Characteristic Curve (ROC AUC) score. See sklearn.metrics.roc_auc_score for documentation.'r2_score'
- \(R^2\) (coefficient of determination) regression score. See sklearn.metrics.r2_score for documentation.
- alibi.explainers.permutation_importance.plot_permutation_importance(exp, features='all', metric_names='all', n_cols=3, sort=True, top_k=None, ax=None, bar_kw=None, fig_kw=None)[source]
Plot permutation feature importance on matplotlib axes.
- Parameters:
exp – An Explanation object produced by a call to the
alibi.explainers.permutation_importance.PermutationImportance.explain()
method.features – A list of feature entries provided in feature_names argument to the
alibi.explainers.permutation_importance.PermutationImportance.explain()
method, or'all'
to plot all the explained features. For example, consider that thefeature_names = ['temp', 'hum', 'windspeed', 'season']
. If we set features=None in the explain method, meaning that all the feature were explained, and we want to plot only the values for the'temp'
and'windspeed'
, then we would setfeatures=[0, 2]
. Otherwise, if we set features=[1, 2, 3] in the explain method, meaning that we explained['hum', 'windspeed', 'season']
, and we want to plot the values only for['windspeed', 'season']
, then we would setfeatures=[1, 2]
(i.e., their index in the features list passed to the explain method). Defaults to'all'
.metric_names – A list of metric entries in the exp.data[‘metrics’] to plot the permutation feature importance for, or
'all'
to plot the permutation feature importance for all metrics (i.e., loss and score functions). The ordering is given by the concatenation of the loss metrics followed by the score metrics.n_cols – Number of columns to organize the resulting plot into.
sort – Boolean flag whether to sort the values in descending order.
top_k – Number of top k values to be displayed if the
sort=True
. If not provided, then all values will be displayed.ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.
bar_kw – Keyword arguments passed to the matplotlib.pyplot.barh function.
fig_kw –
Keyword arguments passed to the matplotlib.figure.set function.
- Returns:
plt.Axes with the feature importance plot.