alibi.utils package
- class alibi.utils.BertBaseUncased(preloading=True)[source]
Bases:
LanguageModel
- SUBWORD_PREFIX = '##'
Language model subword prefix.
- __init__(preloading=True)[source]
Initialize BertBaseUncased.
- Parameters:
preloading (
bool
) – Seealibi.utils.lang_model.LanguageModel.__init__()
.
- is_subword_prefix(token)[source]
Checks if the given token is a part of the tail of a word. Note that a word can be split in multiple tokens (e.g.,
word = [head_token tail_token_1 tail_token_2 ... tail_token_k]
). Each language model has a convention on how to mark a tail token. For example DistilbertBaseUncased and BertBaseUncased have the tail tokens prefixed with the special set of characters'##'
. On the other hand, for RobertaBase only the head token is prefixed with the special character'Ġ'
and thus we need to check the absence of the prefix to identify the tail tokens. We call those special characters SUBWORD_PREFIX. Due to different conventions, this method has to be implemented for each language model. See module docstring for namings.
- class alibi.utils.DistilbertBaseUncased(preloading=True)[source]
Bases:
LanguageModel
- SUBWORD_PREFIX = '##'
Language model subword prefix.
- __init__(preloading=True)[source]
Initialize DistilbertBaseUncased.
- Parameters:
preloading (
bool
) – Seealibi.utils.lang_model.LanguageModel.__init__()
.
- is_subword_prefix(token)[source]
Checks if the given token is a part of the tail of a word. Note that a word can be split in multiple tokens (e.g.,
word = [head_token tail_token_1 tail_token_2 ... tail_token_k]
). Each language model has a convention on how to mark a tail token. For example DistilbertBaseUncased and BertBaseUncased have the tail tokens prefixed with the special set of characters'##'
. On the other hand, for RobertaBase only the head token is prefixed with the special character'Ġ'
and thus we need to check the absence of the prefix to identify the tail tokens. We call those special characters SUBWORD_PREFIX. Due to different conventions, this method has to be implemented for each language model. See module docstring for namings.
- class alibi.utils.DistributedExplainer(distributed_opts, explainer_type, explainer_init_args, explainer_init_kwargs, concatenate_results=True, return_generator=False)[source]
Bases:
object
A class that orchestrates the execution of the execution of a batch of explanations in parallel.
- __getattr__(item)[source]
Accesses actor attributes. Use sparingly as this involves a remote call (that is, these attributes are of an object in a different process). The intended use is for retrieving any common state across the actor at the end of the computation in order to form the response (see notes 2 & 3).
- Parameters:
item (
str
) – The explainer attribute to be returned.- Return type:
- Returns:
The value of the attribute specified by item.
- Raises:
ValueError – If the actor index is invalid.
Notes
This method assumes that the actor implements a return_attribute method.
Note that we are indexing the idle actors. This means that if a pool was initialised with 5 actors and 3 are busy, indexing with index 2 will raise an IndexError.
The order of _idle_actors constantly changes - an actor is removed from it if there is a task to execute and appended back when the task is complete. Therefore, indexing at the same position as computation proceeds will result in retrieving state from different processes.
- __init__(distributed_opts, explainer_type, explainer_init_args, explainer_init_kwargs, concatenate_results=True, return_generator=False)[source]
Creates a pool of actors (i.e., replicas of an instantiated explainer_type in a separate process) which can explain batches of instances in parallel via calls to get_explanation.
- Parameters:
distributed_opts (
Dict
[str
,Any
]) –A dictionary with the following type (minimal signature):
class DistributedOpts(TypedDict): n_cpus: Optional[int] batch_size: Optional[int]
The dictionary may contain two additional keys:
'actor_cpu_frac'
:(float, <= 1.0, >0.0)
- This is used to create more than one process on one CPU/GPU. This may not speed up CPU intensive tasks but it is worth experimenting with when few physical cores are available. In particular, this is highly useful when the user wants to share a GPU for multiple tasks, with the caviat that the machine learning framework itself needs to support running multiple replicas on the same GPU. See the ray documentation here for details.
'algorithm'
:str
- this is specified internally by the caller. It is used in order to register target function callbacks for the parallel pool These should be implemented in the global scope. If not specified, its value will be'default'
, which will select a default target function which expects the actor has a get_explanation method.
explainer_type (
Any
) – Explainer class.explainer_init_args (
Tuple
) – Positional arguments to explainer constructor.explainer_init_kwargs (
dict
) – Keyword arguments to explainer constructor.concatenate_results (
bool
) – IfTrue
concatenates the results. Seealibi.utils.distributed.concatenate_minibatches()
for more details.return_generator (
bool
) – IfTrue
a generator that returns the results in the order the computation finishes is returned when get_explanation is called. Otherwise, the order of the results is the same as the order of the minibatches.
Notes
When
return_generator=True
, the caller has to take elements from the generator (e.g., by calling next) in order to start computing the results (because the ray pool is implemented as a generator).
- create_parallel_pool(explainer_type, explainer_init_args, explainer_init_kwargs)[source]
Creates a pool of actors that can explain the rows of a dataset in parallel.
- Parameters:
documentation. (See constructor)
- get_explanation(X, **kwargs)[source]
Performs distributed explanations of instances in X.
- Parameters:
X (
ndarray
) – A batch of instances to be explained. Split into batches according to the settings passed to the constructor.**kwargs – Any keyword-arguments for the explainer explain method.
- Return type:
Union
[Generator
[Tuple
[int
,Any
],None
,None
],List
[Any
],Any
]- Returns:
The explanations are returned as –
a generator, if the return_generator option is specified. This is used so that the caller can access the results as they are computed. This is the only case when this method is non-blocking and the caller needs to call next on the generator to trigger the parallel computation.
a list of objects, whose type depends on the return type of the explainer. This is returned if no custom preprocessing function is specified.
an object, whose type depends on the return type of the concatenation function return when called with a list of minibatch results with the same order as the minibatches.
- class alibi.utils.LanguageModel(model_path, preloading=True)[source]
Bases:
ABC
- SUBWORD_PREFIX = ''
Language model subword prefix.
- head_tail_split(text)[source]
Split the text in head and tail. Some language models support a maximum number of tokens. Thus is necessary to split the text to meet this constraint. After the text is split in head and tail, only the head is considered for operation. Thus the tail will remain unchanged.
- is_stop_word(tokenized_text, start_idx, punctuation, stopwords)[source]
Checks if the given word starting at the given index is in the list of stopwords.
- Parameters:
- Return type:
- Returns:
True
if the token is in the stopwords list.False
otherwise.
- abstract is_subword_prefix(token)[source]
Checks if the given token is a part of the tail of a word. Note that a word can be split in multiple tokens (e.g.,
word = [head_token tail_token_1 tail_token_2 ... tail_token_k]
). Each language model has a convention on how to mark a tail token. For example DistilbertBaseUncased and BertBaseUncased have the tail tokens prefixed with the special set of characters'##'
. On the other hand, for RobertaBase only the head token is prefixed with the special character'Ġ'
and thus we need to check the absence of the prefix to identify the tail tokens. We call those special characters SUBWORD_PREFIX. Due to different conventions, this method has to be implemented for each language model. See module docstring for namings.
- predict_batch_lm(x, vocab_size, batch_size)[source]
Tensorflow language model batch predictions for AnchorText.
- select_word(tokenized_text, start_idx, punctuation)[source]
Given a tokenized text and the starting index of a word, the function selects the entire word. Note that a word is composed of multiple tokens (e.g.,
word = [head_token tail_token_1 tail_token_2 ... tail_token_k]
). The tail tokens can be identified based on the presence/absence of SUBWORD_PREFIX. Seealibi.utils.lang_model.LanguageModel.is_subword_prefix()
for more details.- Parameters:
- Return type:
- Returns:
The word obtained by concatenation
[head_token tail_token_1 tail_token_2 ... tail_token_k]
.
- class alibi.utils.RobertaBase(preloading=True)[source]
Bases:
LanguageModel
- SUBWORD_PREFIX = 'Ġ'
Language model subword prefix.
- __init__(preloading=True)[source]
Initialize RobertaBase.
- Parameters:
preloading (
bool
) – Seealibi.utils.lang_model.LanguageModel.__init__()
constructor.
- is_subword_prefix(token)[source]
Checks if the given token is a part of the tail of a word. Note that a word can be split in multiple tokens (e.g.,
word = [head_token tail_token_1 tail_token_2 ... tail_token_k]
). Each language model has a convention on how to mark a tail token. For example DistilbertBaseUncased and BertBaseUncased have the tail tokens prefixed with the special set of characters'##'
. On the other hand, for RobertaBase only the head token is prefixed with the special character'Ġ'
and thus we need to check the absence of the prefix to identify the tail tokens. We call those special characters SUBWORD_PREFIX. Due to different conventions, this method has to be implemented for each language model. See module docstring for namings.
- alibi.utils.gen_category_map(data, categorical_columns=None)[source]
- Parameters:
data (
Union
[DataFrame
,ndarray
]) – 2-dimensional pandas dataframe or numpy array.categorical_columns (
Union
[List
[int
],List
[str
],None
]) – A list of columns indicating categorical variables. Optional if passing a pandas dataframe as inference will be used based on dtype'O'
. If passing a numpy array this is compulsory.
- Return type:
- Returns:
category_map – A dictionary with keys being the indices of the categorical columns and values being lists of categories for that column. Implicitly each category is mapped to the index of its position in the list.
- alibi.utils.ohe_to_ord(X_ohe, cat_vars_ohe)[source]
Convert one-hot encoded variables to ordinal encodings.
- Parameters:
X_ohe (
ndarray
) – Data with mixture of one-hot encoded and numerical variables.cat_vars_ohe (
dict
) – Dict with as keys the first column index for each one-hot encoded categorical variable and as values the number of categories per categorical variable.
- Return type:
- Returns:
Ordinal equivalent of one-hot encoded data and dict with categorical columns and number of categories.
- alibi.utils.ord_to_ohe(X_ord, cat_vars_ord)[source]
Convert ordinal to one-hot encoded variables.
- Parameters:
X_ord (
ndarray
) – Data with mixture of ordinal encoded and numerical variables.cat_vars_ord (
dict
) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.
- Return type:
- Returns:
One-hot equivalent of ordinal encoded data and dict with categorical columns and number of categories.
- alibi.utils.visualize_image_attr(attr, original_image=None, method='heat_map', sign='absolute_value', plt_fig_axis=None, outlier_perc=2, cmap=None, alpha_overlay=0.5, show_colorbar=False, title=None, fig_size=(6, 6), use_pyplot=True)[source]
Visualizes attribution for a given image by normalizing attribution values of the desired sign (
'positive'
|'negative'
|'absolute_value'
|'all'
) and displaying them using the desired mode in a matplotlib figure.- Parameters:
attr (
ndarray
) – Numpy array corresponding to attributions to be visualized. Shape must be in the form (H, W, C), with channels as last dimension. Shape must also match that of the original image if provided.original_image (
Optional
[ndarray
]) – Numpy array corresponding to original image. Shape must be in the form (H, W, C), with channels as the last dimension. Image can be provided either with float values in range 0-1 or int values between 0-255. This is a necessary argument for any visualization method which utilizes the original image.method (
str
) –Chosen method for visualizing attribution. Supported options are:
'heat_map'
- Display heat map of chosen attributions'blended_heat_map'
- Overlay heat map over greyscale version of original image. Parameter alpha_overlay corresponds to alpha of heat map.'original_image'
- Only display original image.'masked_image
’ - Mask image (pixel-wise multiply) by normalized attribution values.'alpha_scaling'
- Sets alpha channel of each pixel to be equal to normalized attribution value.
Default:
'heat_map'
.sign (
str
) –Chosen sign of attributions to visualize. Supported options are:
'positive'
- Displays only positive pixel attributions.'absolute_value'
- Displays absolute value of attributions.'negative'
- Displays only negative pixel attributions.'all'
- Displays both positive and negative attribution values. This is not supported for'masked_image'
or'alpha_scaling'
modes, since signed information cannot be represented in these modes.
plt_fig_axis (
Optional
[Tuple
[Figure
,Axes
]]) – Tuple of matplotlib.pyplot.figure and axis on which to visualize. IfNone
is provided, then a new figure and axis are created.outlier_perc (
Union
[int
,float
]) – Top attribution values which correspond to a total of outlier_perc percentage of the total attribution are set to 1 and scaling is performed using the minimum of these values. Forsign='all'
, outliers and scale value are computed using absolute value of attributions.cmap (
Optional
[str
]) – String corresponding to desired colormap for heatmap visualization. This defaults to'Reds'
for negative sign,'Blues'
for absolute value,'Greens'
for positive sign, and a spectrum from red to green for all. Note that this argument is only used for visualizations displaying heatmaps.alpha_overlay (
float
) – Visualizes attribution for a given image by normalizing attribution values of the desired sign (positive, negative, absolute value, or all) and displaying them using the desired mode in a matplotlib figure.show_colorbar (
bool
) – Displays colorbar for heatmap below the visualization. If given method does not use a heatmap, then a colormap axis is created and hidden. This is necessary for appropriate alignment when visualizing multiple plots, some with colorbars and some without.title (
Optional
[str
]) – The title for the plot. IfNone
, no title is set.use_pyplot (
bool
) – IfTrue
, uses pyplot to create and show figure and displays the figure after creating. IfFalse
, uses matplotlib object-oriented API and simply returns a figure object without showing.
- Return type:
Tuple
[Figure
,Axes
]- Returns:
2-element tuple of consisting of –
figure :
matplotlib.pyplot.Figure
- Figure object on which visualization is created. If plt_fig_axis argument is given, this is the same figure provided.axis :
matplotlib.pyplot.Axes
- Axes object on which visualization is created. If plt_fig_axis argument is given, this is the same axis provided.
Submodules
- alibi.utils.approximation_methods module
- alibi.utils.data module
- alibi.utils.discretizer module
- alibi.utils.distance module
- alibi.utils.distributed module
- alibi.utils.distributions module
- alibi.utils.download module
- alibi.utils.frameworks module
- alibi.utils.gradients module
- alibi.utils.kernel module
- alibi.utils.lang_model module
BertBaseUncased
DistilbertBaseUncased
LanguageModel
LanguageModel.SUBWORD_PREFIX
LanguageModel.__init__()
LanguageModel.caller
LanguageModel.from_disk()
LanguageModel.head_tail_split()
LanguageModel.is_punctuation()
LanguageModel.is_stop_word()
LanguageModel.is_subword_prefix()
LanguageModel.mask
LanguageModel.mask_id
LanguageModel.max_num_tokens
LanguageModel.model
LanguageModel.predict_batch_lm()
LanguageModel.select_word()
LanguageModel.to_disk()
LanguageModel.tokenizer
RobertaBase
- alibi.utils.mapping module
- alibi.utils.missing_optional_dependency module
- alibi.utils.tf module
- alibi.utils.visualization module
- alibi.utils.wrappers module