alibi_detect.od.vaegmm module

class alibi_detect.od.vaegmm.OutlierVAEGMM(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

VAEGMM-based outlier detector.

Parameters:

threshold (Optional[float]) – Threshold used for outlier score to determine outliers.
vaegmm (Optional[Model]) – A trained tf.keras model if available.
encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.
decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.
gmm_density_net (Optional[Model]) – Layers for the GMM network wrapped in a tf.keras.Sequential class.
n_gmm (Optional[int]) – Number of components in GMM.
latent_dim (Optional[int]) – Dimensionality of the latent space.
samples (int) – Number of samples sampled to evaluate each instance.
beta (float) – Beta parameter for KL-divergence loss term.
recon_features (Callable) – Function to extract features from the reconstructed instance by the decoder.
data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

fit(X, loss_fn=<function loss_vaegmm>, w_recon=1e-07, w_energy=0.1, w_cov_diag=0.005, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAEGMM model.

Parameters:

X – Training batch.
loss_fn – Loss function used for training.
w_recon – Weight on elbo loss term if default loss_vaegmm.
w_energy – Weight on sample energy loss term if default loss_vaegmm loss fn is used.
w_cov_diag – Weight on covariance regularizing loss term if default loss_vaegmm loss fn is used.
optimizer – Optimizer used for training.
cov_elbo – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).
epochs – Number of training epochs.
batch_size – Batch size used for training.
verbose – Whether to print training progress.
log_metric – Additional metrics whose progress will be displayed if verbose equals True.
callbacks – Callbacks used during training.

infer_threshold(X, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:

X (ndarray) – Batch of instances.
threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.
batch_size (int) – Batch size used when making predictions with the VAEGMM.

Return type:

None

predict(X, batch_size=10000000000, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:

X (ndarray) – Batch of instances.
batch_size (int) – Batch size used when making predictions with the VAEGMM.
return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

'meta' has the model’s metadata.
'data' contains the outlier predictions and instance level outlier scores.

score(X, batch_size=10000000000)[source]

Compute outlier scores.

Parameters:

X (ndarray) – Batch of instances to analyze.
batch_size (int) – Batch size used when making predictions with the VAEGMM.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.