alibi.explainers.anchors.text_samplers module
- class alibi.explainers.anchors.text_samplers.Neighbors(nlp_obj, n_similar=500, w_prob=-15.0)[source]
Bases:
object
- __init__(nlp_obj, n_similar=500, w_prob=-15.0)[source]
Initialize class identifying neighbouring words from the embedding for a given word.
- neighbors(word, tag, top_n)[source]
Find similar words for a certain word in the vocabulary.
- Parameters:
- Return type:
- Returns:
A dict with two fields. The
'words'
field contains a numpy array of the top_n most similar words, whereas the fields'similarities'
is a numpy array with corresponding word similarities.
- class alibi.explainers.anchors.text_samplers.SimilaritySampler(nlp, perturb_opts)[source]
Bases:
AnchorTextSampler
- __call__(anchor, num_samples)[source]
The function returns a numpy array of num_samples where randomly chosen features, except those in anchor, are replaced by similar words with the same part of speech of tag. See
alibi.explainers.anchors.text_samplers.SimilaritySampler.perturb_sentence_similarity()
for details of how the replacement works.- Parameters:
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
See
alibi.explainers.anchors.text_samplers.SimilaritySampler.perturb_sentence_similarity()
.
- __init__(nlp, perturb_opts)[source]
Initialize similarity sampler. This sampler replaces words with similar words.
- Parameters:
nlp (
Language
) – spaCy object.perturb_opts (
Dict
) – Perturbation options.
- find_similar_words()[source]
This function queries a spaCy nlp model to find n similar words with the same part of speech for each word in the instance to be explained. For each word the search procedure returns a dictionary containing a numpy array of words (
'words'
) and a numpy array of word similarities ('similarities'
).- Return type:
- perturb_sentence_similarity(present, n, sample_proba=0.5, forbidden=frozenset({}), forbidden_tags=frozenset({'PRP$'}), forbidden_words=frozenset({'be'}), temperature=1.0, pos=frozenset({'ADJ', 'ADP', 'ADV', 'DET', 'NOUN', 'VERB'}), use_proba=False, **kwargs)[source]
Perturb the text instance to be explained.
- Parameters:
present (
tuple
) – Word index in the text for the words in the proposed anchor.n (
int
) – Number of samples used when sampling from the corpus.sample_proba (
float
) – Sample probability for a word if use_proba=False.forbidden (
frozenset
) – Forbidden lemmas.forbidden_tags (
frozenset
) – Forbidden POS tags.forbidden_words (
frozenset
) – Forbidden words.pos (
frozenset
) – POS that can be changed during perturbation.use_proba (
bool
) – Bool whether to sample according to a similarity score with the corpus embeddings.temperature (
float
) – Sample weight hyper-parameter ifuse_proba=True
.**kwargs – Other arguments. Not used.
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
raw – Array of perturbed text instances.
data – Matrix with 1s and 0s indicating whether a word in the text has not been perturbed for each sample.
- set_data_type()[source]
Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.
- Return type:
- class alibi.explainers.anchors.text_samplers.UnknownSampler(nlp, perturb_opts)[source]
Bases:
AnchorTextSampler
- __call__(anchor, num_samples)[source]
The function returns a numpy array of num_samples where randomly chosen features, except those in anchor, are replaced by
'UNK'
token.- Parameters:
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
raw – Array containing num_samples elements. Each element is a perturbed sentence.
data – A (num_samples, m)-dimensional boolean array, where m is the number of tokens in the instance to be explained.
- __init__(nlp, perturb_opts)[source]
Initialize unknown sampler. This sampler replaces word with the UNK token.
- Parameters:
nlp (
Language
) – spaCy object.perturb_opts (
Dict
) – Perturbation options.
- set_data_type()[source]
Working with numpy arrays of strings requires setting the data type to avoid truncating examples. This function estimates the longest sentence expected during the sampling process, which is used to set the number of characters for the samples and examples arrays. This depends on the perturbation method used for sampling.
- Return type: