alibi_detect.od.vae module

class alibi_detect.od.vae.OutlierVAE(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

VAE-based outlier detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • score_type (str) – Metric used for outlier scores. Either ‘mse’ (mean squared error) or ‘proba’ (reconstruction probabilities) supported.

  • vae (Optional[Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • latent_dim (Optional[int]) – Dimensionality of the latent space.

  • samples (int) – Number of samples sampled to evaluate each instance.

  • beta (float) – Beta parameter for KL-divergence loss term.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

feature_score(X_orig, X_recon)[source]

Compute feature level outlier scores.

Parameters:
  • X_orig (ndarray) – Batch of original instances.

  • X_recon (ndarray) – Batch of reconstructed instances.

Return type:

ndarray

Returns:

Feature level outlier scores.

fit(X, loss_fn=<function elbo>, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAE model.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • optimizer – Optimizer used for training.

  • cov_elbo – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, outlier_type='instance', outlier_perc=100.0, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

  • batch_size (int) – Batch size used when making predictions with the VAE.

Return type:

None

instance_score(fscore, outlier_perc=100.0)[source]

Compute instance level outlier scores.

Parameters:
  • fscore (ndarray) – Feature level outlier scores.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type:

ndarray

Returns:

Instance level outlier scores.

predict(X, outlier_type='instance', outlier_perc=100.0, batch_size=10000000000, return_feature_score=True, return_instance_score=True)[source]

Predict whether instances are outliers or not.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the VAE.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and both feature and instance level outlier scores.

score(X, outlier_perc=100.0, batch_size=10000000000)[source]

Compute feature and instance level outlier scores.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the VAE.

Return type:

Tuple[ndarray, ndarray]

Returns:

Feature and instance level outlier scores.