alibi_detect.ad package
- class alibi_detect.ad.AdversarialAE(threshold=None, ae=None, model=None, encoder_net=None, decoder_net=None, model_hl=None, hidden_layer_kld=None, w_model_hl=None, temperature=1.0, data_type=None)[source]
Bases:
BaseDetector
,FitMixin
,ThresholdMixin
- __init__(threshold=None, ae=None, model=None, encoder_net=None, decoder_net=None, model_hl=None, hidden_layer_kld=None, w_model_hl=None, temperature=1.0, data_type=None)[source]
Autoencoder (AE) based adversarial detector.
- Parameters:
threshold (
Optional
[float
]) – Threshold used for adversarial score to determine adversarial instances.ae (
Optional
[Model
]) – A trained tf.keras autoencoder model if available.model (
Optional
[Model
]) – A trained tf.keras classification model.encoder_net (
Optional
[Model
]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.decoder_net (
Optional
[Model
]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.model_hl (
Optional
[List
[Model
]]) – List with tf.keras models for the hidden layer K-L divergence computation.hidden_layer_kld (
Optional
[dict
]) – Dictionary with as keys the hidden layer(s) of the model which are extracted and used during training of the AE, and as values the output dimension for the hidden layer.w_model_hl (
Optional
[list
]) – Weights assigned to the loss of each model in model_hl.temperature (
float
) – Temperature used for model prediction scaling. Temperature <1 sharpens the prediction probability distribution.data_type (
Optional
[str
]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.
- correct(X, batch_size=10000000000, return_instance_score=True, return_all_predictions=True)[source]
Correct adversarial instances if the adversarial score is above the threshold.
- Parameters:
- Return type:
- Returns:
Dict with corrected predictions and information whether an instance is adversarial or not.
- fit(X, loss_fn=<function loss_adv_ae>, w_model=1.0, w_recon=0.0, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=128, verbose=True, log_metric=None, callbacks=None, preprocess_fn=None)[source]
Train Adversarial AE model.
- Parameters:
X – Training batch.
loss_fn – Loss function used for training.
w_model – Weight on model prediction loss term.
w_recon – Weight on MSE reconstruction error loss term.
optimizer – Optimizer used for training.
epochs – Number of training epochs.
batch_size – Batch size used for training.
verbose – Whether to print training progress.
log_metric – Additional metrics whose progress will be displayed if verbose equals True.
callbacks – Callbacks used during training.
preprocess_fn – Preprocessing function applied to each training batch.
- infer_threshold(X, threshold_perc=99.0, margin=0.0, batch_size=10000000000)[source]
Update threshold by a value inferred from the percentage of instances considered to be adversarial in a sample of the dataset.
- Parameters:
X (
ndarray
) – Batch of instances.threshold_perc (
float
) – Percentage of X considered to be normal based on the adversarial score.margin (
float
) – Add margin to threshold. Useful if adversarial instances have significantly higher scores and there is no adversarial instance in X.batch_size (
int
) – Batch size used when computing scores.
- Return type:
- predict(X, batch_size=10000000000, return_instance_score=True)[source]
Predict whether instances are adversarial instances or not.
- Parameters:
- Return type:
- Returns:
Dictionary containing ‘meta’ and ‘data’ dictionaries.
’meta’ has the model’s metadata.
’data’ contains the adversarial predictions and instance level adversarial scores.
- class alibi_detect.ad.ModelDistillation(threshold=None, distilled_model=None, model=None, loss_type='kld', temperature=1.0, data_type=None)[source]
Bases:
BaseDetector
,FitMixin
,ThresholdMixin
- __init__(threshold=None, distilled_model=None, model=None, loss_type='kld', temperature=1.0, data_type=None)[source]
Model distillation concept drift and adversarial detector.
- Parameters:
threshold (
Optional
[float
]) – Threshold used for score to determine adversarial instances.distilled_model (
Optional
[Model
]) – A tf.keras model to distill.model (
Optional
[Model
]) – A trained tf.keras classification model.loss_type (
str
) – Loss for distillation. Supported: ‘kld’, ‘xent’temperature (
float
) – Temperature used for model prediction scaling. Temperature <1 sharpens the prediction probability distribution.data_type (
Optional
[str
]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.
- fit(X, loss_fn=<function loss_distillation>, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=128, verbose=True, log_metric=None, callbacks=None, preprocess_fn=None)[source]
Train ModelDistillation detector.
- Parameters:
X – Training batch.
loss_fn – Loss function used for training.
optimizer – Optimizer used for training.
epochs – Number of training epochs.
batch_size – Batch size used for training.
verbose – Whether to print training progress.
log_metric – Additional metrics whose progress will be displayed if verbose equals True.
callbacks – Callbacks used during training.
preprocess_fn – Preprocessing function applied to each training batch.
- infer_threshold(X, threshold_perc=99.0, margin=0.0, batch_size=10000000000)[source]
Update threshold by a value inferred from the percentage of instances considered to be adversarial in a sample of the dataset.
- Parameters:
X (
ndarray
) – Batch of instances.threshold_perc (
float
) – Percentage of X considered to be normal based on the adversarial score.margin (
float
) – Add margin to threshold. Useful if adversarial instances have significantly higher scores and there is no adversarial instance in X.batch_size (
int
) – Batch size used when computing scores.
- Return type:
- predict(X, batch_size=10000000000, return_instance_score=True)[source]
Predict whether instances are adversarial instances or not.
- Parameters:
- Return type:
- Returns:
Dictionary containing ‘meta’ and ‘data’ dictionaries.
’meta’ has the model’s metadata.
’data’ contains the adversarial predictions and instance level adversarial scores.