alibi_detect.ad.model_distillation module

class alibi_detect.ad.model_distillation.ModelDistillation(threshold=None, distilled_model=None, model=None, loss_type='kld', temperature=1.0, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, distilled_model=None, model=None, loss_type='kld', temperature=1.0, data_type=None)[source]

Model distillation concept drift and adversarial detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for score to determine adversarial instances.

  • distilled_model (Optional[Model]) – A tf.keras model to distill.

  • model (Optional[Model]) – A trained tf.keras classification model.

  • loss_type (str) – Loss for distillation. Supported: ‘kld’, ‘xent’

  • temperature (float) – Temperature used for model prediction scaling. Temperature <1 sharpens the prediction probability distribution.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

fit(X, loss_fn=<function loss_distillation>, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=128, verbose=True, log_metric=None, callbacks=None, preprocess_fn=None)[source]

Train ModelDistillation detector.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • optimizer – Optimizer used for training.

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

  • preprocess_fn – Preprocessing function applied to each training batch.

infer_threshold(X, threshold_perc=99.0, margin=0.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be adversarial in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the adversarial score.

  • margin (float) – Add margin to threshold. Useful if adversarial instances have significantly higher scores and there is no adversarial instance in X.

  • batch_size (int) – Batch size used when computing scores.

Return type:

None

predict(X, batch_size=10000000000, return_instance_score=True)[source]

Predict whether instances are adversarial instances or not.

Parameters:
  • X (ndarray) – Batch of instances.

  • batch_size (int) – Batch size used when computing scores.

  • return_instance_score (bool) – Whether to return instance level adversarial scores.

Return type:

Dict[Dict[str, str], Dict[str, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the adversarial predictions and instance level adversarial scores.

score(X, batch_size=10000000000, return_predictions=False)[source]

Compute adversarial scores.

Parameters:
  • X (ndarray) – Batch of instances to analyze.

  • batch_size (int) – Batch size used when computing scores.

  • return_predictions (bool) – Whether to return the predictions of the classifier on the original and reconstructed instances.

Return type:

Union[ndarray, Tuple[ndarray, ndarray, ndarray]]

Returns:

Array with adversarial scores for each instance in the batch.