alibi.explainers.anchors.text_samplers module

class alibi.explainers.anchors.text_samplers.AnchorTextSampler[source]

Bases: object

abstract set_text(text)[source]
Return type:

None

class alibi.explainers.anchors.text_samplers.Neighbors(nlp_obj, n_similar=500, w_prob=-15.0)[source]

Bases: object

__init__(nlp_obj, n_similar=500, w_prob=-15.0)[source]

Initialize class identifying neighbouring words from the embedding for a given word.

Parameters:
  • nlp_obj (Language) – spaCy model.

  • n_similar (int) – Number of similar words to return.

  • w_prob (float) – Smoothed log probability estimate of token’s type.

neighbors(word, tag, top_n)[source]

Find similar words for a certain word in the vocabulary.

Parameters:
  • word (str) – Word for which we need to find similar words.

  • tag (str) – Part of speech tag for the words.

  • top_n (int) – Return only top_n neighbors.

Return type:

dict

Returns:

A dict with two fields. The 'words' field contains a numpy array of the top_n most similar words, whereas the fields 'similarities' is a numpy array with corresponding word similarities.

class alibi.explainers.anchors.text_samplers.SimilaritySampler(nlp, perturb_opts)[source]

Bases: AnchorTextSampler

__call__(anchor, num_samples)[source]

The function returns a numpy array of num_samples where randomly chosen features, except those in anchor, are replaced by similar words with the same part of speech of tag. See alibi.explainers.anchors.text_samplers.SimilaritySampler.perturb_sentence_similarity() for details of how the replacement works.

Parameters:
  • anchor (tuple) – Indices represent the positions of the words to be kept unchanged.

  • num_samples (int) – Number of perturbed sentences to be returned.

Return type:

Tuple[ndarray, ndarray]

Returns:

See alibi.explainers.anchors.text_samplers.SimilaritySampler.perturb_sentence_similarity().

__init__(nlp, perturb_opts)[source]

Initialize similarity sampler. This sampler replaces words with similar words.

Parameters:
  • nlp (Language) – spaCy object.

  • perturb_opts (Dict) – Perturbation options.

find_similar_words()[source]

This function queries a spaCy nlp model to find n similar words with the same part of speech for each word in the instance to be explained. For each word the search procedure returns a dictionary containing a numpy array of words ('words') and a numpy array of word similarities ('similarities').

Return type:

None

perturb_sentence_similarity(present, n, sample_proba=0.5, forbidden=frozenset({}), forbidden_tags=frozenset({'PRP$'}), forbidden_words=frozenset({'be'}), temperature=1.0, pos=frozenset({'ADJ', 'ADP', 'ADV', 'DET', 'NOUN', 'VERB'}), use_proba=False, **kwargs)[source]

Perturb the text instance to be explained.

Parameters:
  • present (tuple) – Word index in the text for the words in the proposed anchor.

  • n (int) – Number of samples used when sampling from the corpus.

  • sample_proba (float) – Sample probability for a word if use_proba=False.

  • forbidden (frozenset) – Forbidden lemmas.

  • forbidden_tags (frozenset) – Forbidden POS tags.

  • forbidden_words (frozenset) – Forbidden words.

  • pos (frozenset) – POS that can be changed during perturbation.

  • use_proba (bool) – Bool whether to sample according to a similarity score with the corpus embeddings.

  • temperature (float) – Sample weight hyper-parameter if use_proba=True.

  • **kwargs – Other arguments. Not used.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • raw – Array of perturbed text instances.

  • data – Matrix with 1s and 0s indicating whether a word in the text has not been perturbed for each sample.

set_data_type()[source]

Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.

Return type:

None

set_text(text)[source]

Sets the text to be processed

Parameters:

text (str) – Text to be processed.

Return type:

None

class alibi.explainers.anchors.text_samplers.UnknownSampler(nlp, perturb_opts)[source]

Bases: AnchorTextSampler

UNK: str = 'UNK'

Unknown token to be used.

__call__(anchor, num_samples)[source]

The function returns a numpy array of num_samples where randomly chosen features, except those in anchor, are replaced by 'UNK' token.

Parameters:
  • anchor (tuple) – Indices represent the positions of the words to be kept unchanged.

  • num_samples (int) – Number of perturbed sentences to be returned.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • raw – Array containing num_samples elements. Each element is a perturbed sentence.

  • data – A (num_samples, m)-dimensional boolean array, where m is the number of tokens in the instance to be explained.

__init__(nlp, perturb_opts)[source]

Initialize unknown sampler. This sampler replaces word with the UNK token.

Parameters:
  • nlp (Language) – spaCy object.

  • perturb_opts (Dict) – Perturbation options.

set_data_type()[source]

Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.

Return type:

None

set_text(text)[source]

Sets the text to be processed.

Parameters:

text (str) – Text to be processed.

Return type:

None

alibi.explainers.anchors.text_samplers.load_spacy_lexeme_prob(nlp)[source]

This utility function loads the lexeme_prob table for a spacy model if it is not present. This is required to enable support for different spacy versions.

Return type:

Language