alibi.explainers.partial_dependence module

class alibi.explainers.partial_dependence.Kind(value)[source]

Bases: str, Enum

Enumeration of supported kind.

AVERAGE = 'average'
BOTH = 'both'
INDIVIDUAL = 'individual'
class alibi.explainers.partial_dependence.PartialDependence(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: PartialDependenceBase

Black-box implementation of partial dependence for tabular datasets. Supports multiple feature interactions.

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Initialize black-box model implementation of partial dependence.

Parameters:
  • predictor (Callable[[ndarray], ndarray]) – A prediction function which receives as input a numpy array of size N x F and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features and T is the number of targets.

  • feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

  • categorical_names (Optional[Dict[int, List[str]]]) –

    Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

    category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}
    

  • target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

  • verbose (bool) – Whether to print the progress of the explainer.

Notes

The length of the target_names should match the number of columns returned by a call to the predictor. For example, in the case of a binary classifier, if the predictor outputs a decision score (i.e. uses the decision_function method) which returns one column, then the length of the target_names should be one. On the other hand, if the predictor outputs a prediction probability (i.e. uses the predict_proba method) which returns two columns (one for the negative class and one for the positive class), then the length of the target_names should be two.

explain(X, features=None, kind='average', percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the partial dependence for each feature and/or tuples of features with respect to the all targets and the reference dataset X.

Parameters:
  • X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

  • features (Optional[List[Union[int, Tuple[int, int]]]]) – An optional list of features or tuples of features for which to calculate the partial dependence. If not provided, the partial dependence will be computed for every single features in the dataset. Some example for features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

  • kind (Literal[‘average’, ‘individual’, ‘both’]) – If set to 'average', then only the partial dependence (PD) averaged across all samples from the dataset is returned. If set to 'individual', then only the individual conditional expectation (ICE) is returned for each data point from the dataset. Otherwise, if set to 'both', then both the PD and the ICE are returned.

  • percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

  • grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

  • grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type:

Explanation

Returns:

explanation – An Explanation object containing the data and the metadata of the calculated partial dependence curves. See usage at Partial dependence examples for details

class alibi.explainers.partial_dependence.PartialDependenceBase(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: Explainer, ABC

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Base class of the partial dependence for tabular datasets. Supports multiple feature interactions.

Parameters:
  • predictor (Union[BaseEstimator, Callable[[ndarray], ndarray]]) – A sklearn estimator or a prediction function which receives as input a numpy array of size N x F and outputs a numpy array of size N (i.e. (N, )) or N x T, where N is the number of input instances, F is the number of features and T is the number of targets.

  • feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

  • categorical_names (Optional[Dict[int, List[str]]]) –

    Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

    category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}
    

  • target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

  • verbose (bool) – Whether to print the progress of the explainer.

explain(X, features=None, kind='average', percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the partial dependence for each feature and/or tuples of features with respect to the all targets and the reference dataset X.

Parameters:
  • X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

  • features (Optional[List[Union[int, Tuple[int, int]]]]) – An optional list of features or tuples of features for which to calculate the partial dependence. If not provided, the partial dependence will be computed for every single features in the dataset. Some example for features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

  • kind (Literal[‘average’, ‘individual’, ‘both’]) – If set to 'average', then only the partial dependence (PD) averaged across all samples from the dataset is returned. If set to 'individual', then only the individual conditional expectation (ICE) is returned for each data point from the dataset. Otherwise, if set to 'both', then both the PD and the ICE are returned.

  • percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

  • grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

  • grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type:

Explanation

Returns:

explanation – An Explanation object containing the data and the metadata of the calculated partial dependence curves. See usage at Partial dependence examples for details

reset_predictor(predictor)[source]

Resets the predictor function or tree-based sklearn estimator.

Parameters:

predictor (Union[Callable[[ndarray], ndarray], BaseEstimator]) – New predictor function or tree-based sklearn estimator.

Return type:

None

class alibi.explainers.partial_dependence.TreePartialDependence(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Bases: PartialDependenceBase

Tree-based model sklearn implementation of the partial dependence for tabular datasets. Supports multiple feature interactions. This method is faster than the general black-box implementation but is only supported by some tree-based estimators. The computation is based on a weighted tree traversal. For more details on the computation, check the sklearn documentation page. The supported sklearn models are: GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, HistGradientBoostingRegressor, DecisionTreeRegressor, RandomForestRegressor.

__init__(predictor, feature_names=None, categorical_names=None, target_names=None, verbose=False)[source]

Initialize tree-based model sklearn implementation of partial dependence.

Parameters:
  • predictor (BaseEstimator) – A tree-based sklearn estimator.

  • feature_names (Optional[List[str]]) – A list of feature names used for displaying results.

  • categorical_names (Optional[Dict[int, List[str]]]) –

    Dictionary where keys are feature columns and values are the categories for the feature. Necessary to identify the categorical features in the dataset. An example for categorical_names would be:

    category_map = {0: ["married", "divorced"], 3: ["high school diploma", "master's degree"]}
    

  • target_names (Optional[List[str]]) – A list of target/output names used for displaying results.

  • verbose (bool) – Whether to print the progress of the explainer.

Notes

The length of the target_names should match the number of columns returned by a call to the predictor.decision_function. In the case of a binary classifier, the decision score consists of a single column. Thus, the length of the target_names should be one.

explain(X, features=None, percentiles=(0.0, 1.0), grid_resolution=100, grid_points=None)[source]

Calculates the partial dependence for each feature and/or tuples of features with respect to the all targets and the reference dataset X.

Parameters:
  • X (ndarray) – A N x F tabular dataset used to calculate partial dependence curves. This is typically the training dataset or a representative sample.

  • features (Optional[List[Union[int, Tuple[int, int]]]]) – An optional list of features or tuples of features for which to calculate the partial dependence. If not provided, the partial dependence will be computed for every single features in the dataset. Some example for features would be: [0, 2], [0, 2, (0, 2)], [(0, 2)], where 0 and 2 correspond to column 0 and 2 in X, respectively.

  • percentiles (Tuple[float, float]) – Lower and upper percentiles used to limit the feature values to potentially remove outliers from low-density regions. Note that for features with not many data points with large/low values, the PD estimates are less reliable in those extreme regions. The values must be in [0, 1]. Only used with grid_resolution.

  • grid_resolution (int) – Number of equidistant points to split the range of each target feature. Only applies if the number of unique values of a target feature in the reference dataset X is greater than the grid_resolution value. For example, consider a case where a feature can take the following values: [0.1, 0.3, 0.35, 0.351, 0.4, 0.41, 0.44, ..., 0.5, 0.54, 0.56, 0.6, 0.65, 0.7, 0.9], and we are not interested in evaluating the marginal effect at every single point as it can become computationally costly (assume hundreds/thousands of points) without providing any additional information for nearby points (e.g., 0.35 and 351). By setting grid_resolution=5, the marginal effect is computed for the values [0.1, 0.3, 0.5, 0.7, 0.9] instead, which is less computationally demanding and can provide similar insights regarding the model’s behaviour. Note that the extreme values of the grid can be controlled using the percentiles argument.

  • grid_points (Optional[Dict[int, Union[List, ndarray]]]) – Custom grid points. Must be a dict where the keys are the target features indices and the values are monotonically increasing arrays defining the grid points for a numerical feature, and a subset of categorical feature values for a categorical feature. If the grid_points are not specified, then the grid will be constructed based on the unique target feature values available in the dataset X, or based on the grid_resolution and percentiles (check grid_resolution to see when it applies). For categorical features, the corresponding value in the grid_points can be specified either as array of strings or array of integers corresponding the label encodings. Note that the label encoding must match the ordering of the values provided in the categorical_names.

Return type:

Explanation

alibi.explainers.partial_dependence.plot_pd(exp, features='all', target=0, n_cols=3, n_ice=100, center=False, pd_limits=None, levels=8, ax=None, sharey='all', pd_num_kw=None, ice_num_kw=None, pd_cat_kw=None, ice_cat_kw=None, pd_num_num_kw=None, pd_num_cat_kw=None, pd_cat_cat_kw=None, fig_kw=None)[source]

Plot partial dependence curves on matplotlib axes.

Parameters:
  • exp – An Explanation object produced by a call to the alibi.explainers.partial_dependence.PartialDependence.explain() method.

  • features – A list of features entries in the exp.data[‘feature_names’] to plot the partial dependence curves for, or 'all' to plot all the explained feature or tuples of features. This includes tuples of features. For example, if exp.data['feature_names'] = ['temp', 'hum', ('temp', 'windspeed')] and we want to plot the partial dependence only for the 'temp' and ('temp', 'windspeed'), then we would set features=[0, 2]. Defaults to 'all'.

  • target – The target name or index for which to plot the partial dependence (PD) curves. Can be a mix of integers denoting target index or strings denoting entries in exp.meta[‘params’][‘target_names’].

  • n_cols – Number of columns to organize the resulting plot into.

  • n_ice

    Number of ICE plots to be displayed. Can be

    • a string taking the value 'all' to display the ICE curves for every instance in the reference dataset.

    • an integer for which n_ice instances from the reference dataset will be sampled uniformly at random to display their ICE curves.

    • a list of integers, where each integer represents an index of an instance in the reference dataset to display their ICE curves.

  • center

    Boolean flag to center the individual conditional expectation (ICE) curves. As mentioned in Goldstein et al. (2014), the heterogeneity in the model can be difficult to discern when the intercepts of the ICE curves cover a wide range. Centering the ICE curves removes the level effects and helps to visualise the heterogeneous effect.

  • pd_limits – Minimum and maximum y-limits for all the one-way PD plots. If None will be automatically inferred.

  • levels – Number of levels in the contour plot.

  • ax – A matplotlib axes object or a numpy array of matplotlib axes to plot on.

  • sharey – A parameter specifying whether the y-axis of the PD and ICE curves should be on the same scale for several features. Possible values are: 'all' | 'row' | None.

  • pd_num_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a numerical feature.

  • ice_num_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the ICE for a numerical feature.

  • pd_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a categorical feature.

  • ice_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the ICE for a categorical feature.

  • pd_num_num_kw – Keyword arguments passed to the matplotlib.pyplot.contourf function when plotting the PD for two numerical features.

  • pd_num_cat_kw – Keyword arguments passed to the matplotlib.pyplot.plot function when plotting the PD for a numerical and a categorical feature.

  • pd_cat_cat_kw – Keyword arguments passed to the alibi.utils.visualization.heatmap() functon when plotting the PD for two categorical features.

  • fig_kw

    Keyword arguments passed to the matplotlib.figure.set function.

Returns:

An array of plt.Axes with the resulting partial dependence plots.