alibi_detect.cd.model_uncertainty module
- class alibi_detect.cd.model_uncertainty.ClassifierUncertaintyDrift(x_ref, model, p_val=0.05, x_ref_preprocessed=False, backend=None, update_x_ref=None, preds_type='probs', uncertainty_type='entropy', margin_width=0.1, batch_size=32, preprocess_batch_fn=None, device=None, tokenizer=None, max_len=None, input_shape=None, data_type=None)[source]
Bases:
DriftConfigMixin
- __init__(x_ref, model, p_val=0.05, x_ref_preprocessed=False, backend=None, update_x_ref=None, preds_type='probs', uncertainty_type='entropy', margin_width=0.1, batch_size=32, preprocess_batch_fn=None, device=None, tokenizer=None, max_len=None, input_shape=None, data_type=None)[source]
Test for a change in the number of instances falling into regions on which the model is uncertain. Performs either a K-S test on prediction entropies or Chi-squared test on 0-1 indicators of predictions falling into a margin of uncertainty (e.g. probs falling into [0.45, 0.55] in binary case).
- Parameters:
x_ref (
Union
[ndarray
,list
]) – Data used as reference distribution. Should be disjoint from the data the model was trained on for accurate p-values.model (
Callable
) – Classification model outputting class probabilities (or logits)backend (
Optional
[str
]) – Backend to use if model requires batch prediction. Options are ‘tensorflow’ or ‘pytorch’.p_val (
float
) – p-value used for the significance of the test.x_ref_preprocessed (
bool
) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.update_x_ref (
Optional
[Dict
[str
,int
]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.preds_type (
str
) – Type of prediction output by the model. Options are ‘probs’ (in [0,1]) or ‘logits’ (in [-inf,inf]).uncertainty_type (
str
) – Method for determining the model’s uncertainty for a given instance. Options are ‘entropy’ or ‘margin’.margin_width (
float
) – Width of the margin if uncertainty_type = ‘margin’. The model is considered uncertain on an instance if the highest two class probabilities it assigns to the instance differ by less than margin_width.batch_size (
int
) – Batch size used to evaluate model. Only relevant when backend has been specified for batch prediction.preprocess_batch_fn (
Optional
[Callable
]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.device (
Union
[Literal
[‘cuda’, ‘gpu’, ‘cpu’],device
,None
]) – Device type used. The default tries to use the GPU and falls back on CPU if needed. Can be specified by passing either'cuda'
,'gpu'
,'cpu'
or an instance oftorch.device
. Only relevant for ‘pytorch’ backend.tokenizer (
Optional
[Callable
]) – Optional tokenizer for NLP models.max_len (
Optional
[int
]) – Optional max token length for NLP models.data_type (
Optional
[str
]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.
- predict(x, return_p_val=True, return_distance=True)[source]
Predict whether a batch of data has drifted from the reference data.
- Parameters:
- Return type:
- Returns:
Dictionary containing
'meta'
and'data'
dictionaries. –'meta'
has the model’s metadata.'data'
contains the drift prediction and optionally the p-value, threshold and test statistic.
- class alibi_detect.cd.model_uncertainty.RegressorUncertaintyDrift(x_ref, model, p_val=0.05, x_ref_preprocessed=False, backend=None, update_x_ref=None, uncertainty_type='mc_dropout', n_evals=25, batch_size=32, preprocess_batch_fn=None, device=None, tokenizer=None, max_len=None, input_shape=None, data_type=None)[source]
Bases:
DriftConfigMixin
- __init__(x_ref, model, p_val=0.05, x_ref_preprocessed=False, backend=None, update_x_ref=None, uncertainty_type='mc_dropout', n_evals=25, batch_size=32, preprocess_batch_fn=None, device=None, tokenizer=None, max_len=None, input_shape=None, data_type=None)[source]
Test for a change in the number of instances falling into regions on which the model is uncertain. Performs either a K-S test on uncertainties estimated from an preditive ensemble given either explicitly or implicitly as a model with dropout layers.
- Parameters:
x_ref (
Union
[ndarray
,list
]) – Data used as reference distribution. Should be disjoint from the data the model was trained on for accurate p-values.model (
Callable
) – Regression model outputting class probabilities (or logits)backend (
Optional
[str
]) – Backend to use if model requires batch prediction. Options are ‘tensorflow’ or ‘pytorch’.p_val (
float
) – p-value used for the significance of the test.x_ref_preprocessed (
bool
) – Whether the given reference data x_ref has been preprocessed yet. If x_ref_preprocessed=True, only the test data x will be preprocessed at prediction time. If x_ref_preprocessed=False, the reference data will also be preprocessed.update_x_ref (
Optional
[Dict
[str
,int
]]) – Reference data can optionally be updated to the last n instances seen by the detector or via reservoir sampling with size n. For the former, the parameter equals {‘last’: n} while for reservoir sampling {‘reservoir_sampling’: n} is passed.uncertainty_type (
str
) – Method for determining the model’s uncertainty for a given instance. Options are ‘mc_dropout’ or ‘ensemble’. The former should output a scalar per instance. The latter should output a vector of predictions per instance.n_evals (
int
) – The number of times to evaluate the model under different dropout configurations. Only relevant when using the ‘mc_dropout’ uncertainty type.batch_size (
int
) – Batch size used to evaluate model. Only relevant when backend has been specified for batch prediction.preprocess_batch_fn (
Optional
[Callable
]) – Optional batch preprocessing function. For example to convert a list of objects to a batch which can be processed by the model.device (
Union
[Literal
[‘cuda’, ‘gpu’, ‘cpu’],device
,None
]) – Device type used. The default tries to use the GPU and falls back on CPU if needed. Can be specified by passing either'cuda'
,'gpu'
,'cpu'
or an instance oftorch.device
. Only relevant for ‘pytorch’ backend.tokenizer (
Optional
[Callable
]) – Optional tokenizer for NLP models.max_len (
Optional
[int
]) – Optional max token length for NLP models.data_type (
Optional
[str
]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.
- predict(x, return_p_val=True, return_distance=True)[source]
Predict whether a batch of data has drifted from the reference data.
- Parameters:
- Return type:
- Returns:
Dictionary containing
'meta'
and'data'
dictionaries. –'meta'
has the model’s metadata.'data'
contains the drift prediction and optionally the p-value, threshold and test statistic.