alibi_detect.od package

class alibi_detect.od.IForest(threshold=None, n_estimators=100, max_samples='auto', max_features=1.0, bootstrap=False, n_jobs=1, data_type='tabular')[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, n_estimators=100, max_samples='auto', max_features=1.0, bootstrap=False, n_jobs=1, data_type='tabular')[source]

Outlier detector for tabular data using isolation forests.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • n_estimators (int) – Number of base estimators in the ensemble.

  • max_samples (Union[str, int, float]) – Number of samples to draw from the training data to train each base estimator. If int, draw ‘max_samples’ samples. If float, draw ‘max_samples * number of features’ samples. If ‘auto’, max_samples = min(256, number of samples)

  • max_features (Union[int, float]) – Number of features to draw from the training data to train each base estimator. If int, draw ‘max_features’ features. If float, draw ‘max_features * number of features’ features.

  • bootstrap (bool) – Whether to fit individual trees on random subsets of the training data, sampled with replacement.

  • n_jobs (int) – Number of jobs to run in parallel for ‘fit’ and ‘predict’.

  • data_type (str) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

fit(X, sample_weight=None)[source]

Fit isolation forest.

Parameters:
  • X (ndarray) – Training batch.

  • sample_weight (Optional[ndarray]) – Sample weights.

Return type:

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type:

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters:

X (ndarray) – Batch of instances to analyze.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.

class alibi_detect.od.LLR(threshold=None, model=None, model_background=None, log_prob=None, sequential=False, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, model=None, model_background=None, log_prob=None, sequential=False, data_type=None)[source]

Likelihood Ratios for Out-of-Distribution Detection. Ren, J. et al. NeurIPS 2019. https://arxiv.org/abs/1906.02845

Parameters:
  • threshold (Optional[float]) – Threshold used for the likelihood ratio (LLR) to determine outliers.

  • model (Union[Model, Distribution, PixelCNN, None]) – Generative model, defaults to PixelCNN.

  • model_background (Union[Model, Distribution, PixelCNN, None]) – Optional model for the background. Only needed if it is different from model.

  • log_prob (Optional[Callable]) – Function used to evaluate log probabilities under the model if the model does not have a log_prob function.

  • sequential (bool) – Whether the data is sequential. Used to create targets during training.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

feature_score(X, batch_size=10000000000)[source]

Feature-level negative likelihood ratios.

Return type:

ndarray

fit(X, mutate_fn=<function mutate_categorical>, mutate_fn_kwargs={'feature_range': (0, 255), 'rate': 0.2, 'seed': 0}, mutate_batch_size=10000000000, loss_fn=None, loss_fn_kwargs=None, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train semantic and background generative models.

Parameters:
  • X – Training batch.

  • mutate_fn – Mutation function used to generate the background dataset.

  • mutate_fn_kwargs – Kwargs for the mutation function used to generate the background dataset. Default values set for an image dataset.

  • mutate_batch_size – Batch size used to generate the mutations for the background dataset.

  • loss_fn – Loss function used for training.

  • loss_fn_kwargs – Kwargs for loss function.

  • optimizer – Optimizer used for training.

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, outlier_type='instance', threshold_perc=95.0, batch_size=10000000000)[source]

Update LLR threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • threshold_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size for the generative model evaluations.

Return type:

None

instance_score(X, batch_size=10000000000)[source]

Instance-level negative likelihood ratios.

Return type:

ndarray

llr(X, return_per_feature, batch_size=10000000000)[source]

Compute likelihood ratios.

Parameters:
  • X (ndarray) – Batch of instances.

  • return_per_feature (bool) – Return likelihood ratio per feature.

  • batch_size (int) – Batch size for the generative model evaluations.

Return type:

ndarray

Returns:

Likelihood ratios.

logp(dist, X, return_per_feature=False, batch_size=10000000000)[source]

Compute log probability of a batch of instances under the generative model.

Parameters:
  • dist – Distribution of the model.

  • X (ndarray) – Batch of instances.

  • return_per_feature (bool) – Return log probability per feature.

  • batch_size (int) – Batch size for the generative model evaluations.

Return type:

ndarray

Returns:

Log probabilities.

logp_alt(model, X, return_per_feature=False, batch_size=10000000000)[source]

Compute log probability of a batch of instances using the log_prob function defined by the user.

Parameters:
  • model (Model) – Trained model.

  • X (ndarray) – Batch of instances.

  • return_per_feature (bool) – Return log probability per feature.

  • batch_size (int) – Batch size for the generative model evaluations.

Return type:

ndarray

Returns:

Log probabilities.

predict(X, outlier_type='instance', batch_size=10000000000, return_feature_score=True, return_instance_score=True)[source]

Predict whether instances are outliers or not.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • batch_size (int) – Batch size used when making predictions with the generative model.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and both feature and instance level outlier scores.

score(X, batch_size=10000000000)[source]

Feature-level and instance-level outlier scores. The scores are equal to the negative likelihood ratios.

Return type:

Tuple[ndarray, ndarray]

class alibi_detect.od.Mahalanobis(threshold=None, n_components=3, std_clip=3, start_clip=100, max_n=None, cat_vars=None, ohe=False, data_type='tabular')[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, n_components=3, std_clip=3, start_clip=100, max_n=None, cat_vars=None, ohe=False, data_type='tabular')[source]

Outlier detector for tabular data using the Mahalanobis distance.

Parameters:
  • threshold (Optional[float]) – Mahalanobis distance threshold used to classify outliers.

  • n_components (int) – Number of principal components used.

  • std_clip (int) – Feature-wise stdev used to clip the observations before updating the mean and cov.

  • start_clip (int) – Number of observations before clipping is applied.

  • max_n (Optional[int]) – Algorithm behaves as if it has seen at most max_n points.

  • cat_vars (Optional[dict]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

  • ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

  • data_type (str) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

cat2num(X)[source]

Convert categorical variables to numerical values.

Parameters:

X (ndarray) – Batch of instances to analyze.

Return type:

ndarray

Returns:

Batch of instances where categorical variables are converted to numerical values.

fit(X, y=None, d_type='abdm', w=None, disc_perc=[25, 50, 75], standardize_cat_vars=True, feature_range=(-10000000000.0, 10000000000.0), smooth=1.0, center=True)[source]

If categorical variables are present, then transform those to numerical values. This step is not necessary in the absence of categorical variables.

Parameters:
  • X (ndarray) – Batch of instances used to infer distances between categories from.

  • y (Optional[ndarray]) – Model class predictions or ground truth labels for X. Used for ‘mvdm’ and ‘abdm-mvdm’ pairwise distance metrics. Note that this is only compatible with classification problems. For regression problems, use the ‘abdm’ distance metric.

  • d_type (str) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdm-mvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdm-mvdm’ is a weighted combination of the two metrics.

  • w (Optional[float]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdm-mvdm’.

  • disc_perc (list) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdm-mvdm’ pairwise distance measures.

  • standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges.

  • smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.

  • center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

Return type:

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type:

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters:

X (ndarray) – Batch of instances to analyze.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.

class alibi_detect.od.OutlierAE(threshold=None, ae=None, encoder_net=None, decoder_net=None, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, ae=None, encoder_net=None, decoder_net=None, data_type=None)[source]

AE-based outlier detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • ae (Optional[Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.

  • decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

feature_score(X_orig, X_recon)[source]

Compute feature level outlier scores.

Parameters:
  • X_orig (ndarray) – Batch of original instances.

  • X_recon (ndarray) – Batch of reconstructed instances.

Return type:

ndarray

Returns:

Feature level outlier scores.

fit(X, loss_fn=tensorflow.keras.losses.MeanSquaredError, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train AE model.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • optimizer – Optimizer used for training.

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, outlier_type='instance', outlier_perc=100.0, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

  • batch_size (int) – Batch size used when making predictions with the autoencoder.

Return type:

None

instance_score(fscore, outlier_perc=100.0)[source]

Compute instance level outlier scores.

Parameters:
  • fscore (ndarray) – Feature level outlier scores.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type:

ndarray

Returns:

Instance level outlier scores.

predict(X, outlier_type='instance', outlier_perc=100.0, batch_size=10000000000, return_feature_score=True, return_instance_score=True)[source]

Predict whether instances are outliers or not.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the autoencoder.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and both feature and instance level outlier scores.

score(X, outlier_perc=100.0, batch_size=10000000000)[source]

Compute feature and instance level outlier scores.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the autoencoder.

Return type:

Tuple[ndarray, ndarray]

Returns:

Feature and instance level outlier scores.

class alibi_detect.od.OutlierAEGMM(threshold=None, aegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, recon_features=<function eucl_cosim_features>, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, aegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, recon_features=<function eucl_cosim_features>, data_type=None)[source]

AEGMM-based outlier detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • aegmm (Optional[Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘aegmm’ is specified.

  • decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘aegmm’ is specified.

  • gmm_density_net (Optional[Model]) – Layers for the GMM network wrapped in a tf.keras.Sequential class.

  • n_gmm (Optional[int]) – Number of components in GMM.

  • recon_features (Callable) – Function to extract features from the reconstructed instance by the decoder.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

fit(X, loss_fn=<function loss_aegmm>, w_energy=0.1, w_cov_diag=0.005, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train AEGMM model.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • w_energy – Weight on sample energy loss term if default loss_aegmm loss fn is used.

  • w_cov_diag – Weight on covariance regularizing loss term if default loss_aegmm loss fn is used.

  • optimizer – Optimizer used for training.

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

  • batch_size (int) – Batch size used when making predictions with the AEGMM.

Return type:

None

predict(X, batch_size=10000000000, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Batch of instances.

  • batch_size (int) – Batch size used when making predictions with the AEGMM.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and instance level outlier scores.

score(X, batch_size=10000000000)[source]

Compute outlier scores.

Parameters:
  • X (ndarray) – Batch of instances to analyze.

  • batch_size (int) – Batch size used when making predictions with the AEGMM.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.

class alibi_detect.od.OutlierProphet(threshold=0.8, growth='linear', cap=None, holidays=None, holidays_prior_scale=10.0, country_holidays=None, changepoint_prior_scale=0.05, changepoint_range=0.8, seasonality_mode='additive', daily_seasonality='auto', weekly_seasonality='auto', yearly_seasonality='auto', add_seasonality=None, seasonality_prior_scale=10.0, uncertainty_samples=1000, mcmc_samples=0)[source]

Bases: BaseDetector, FitMixin

__init__(threshold=0.8, growth='linear', cap=None, holidays=None, holidays_prior_scale=10.0, country_holidays=None, changepoint_prior_scale=0.05, changepoint_range=0.8, seasonality_mode='additive', daily_seasonality='auto', weekly_seasonality='auto', yearly_seasonality='auto', add_seasonality=None, seasonality_prior_scale=10.0, uncertainty_samples=1000, mcmc_samples=0)[source]

Outlier detector for time series data using fbprophet. See https://facebook.github.io/prophet/ for more details.

Parameters:
  • threshold (float) – Width of the uncertainty intervals of the forecast, used as outlier threshold. Equivalent to interval_width. If the instance lies outside of the uncertainty intervals, it is flagged as an outlier. If mcmc_samples equals 0, it is the uncertainty in the trend using the MAP estimate of the extrapolated model. If mcmc_samples >0, then uncertainty over all parameters is used.

  • growth (str) – ‘linear’ or ‘logistic’ to specify a linear or logistic trend.

  • cap (Optional[float]) – Growth cap in case growth equals ‘logistic’.

  • holidays (Optional[DataFrame]) – pandas DataFrame with columns holiday (string) and ds (dates) and optionally columns lower_window and upper_window which specify a range of days around the date to be included as holidays.

  • holidays_prior_scale (float) – Parameter controlling the strength of the holiday components model. Higher values imply a more flexible trend, more prone to more overfitting.

  • country_holidays (Optional[str]) – Include country-specific holidays via country abbreviations. The holidays for each country are provided by the holidays package in Python. A list of available countries and the country name to use is available on: https://github.com/dr-prodigy/python-holidays. Additionally, Prophet includes holidays for: Brazil (BR), Indonesia (ID), India (IN), Malaysia (MY), Vietnam (VN), Thailand (TH), Philippines (PH), Turkey (TU), Pakistan (PK), Bangladesh (BD), Egypt (EG), China (CN) and Russian (RU).

  • changepoint_prior_scale (float) – Parameter controlling the flexibility of the automatic changepoint selection. Large values will allow many changepoints, potentially leading to overfitting.

  • changepoint_range (float) – Proportion of history in which trend changepoints will be estimated. Higher values means more changepoints, potentially leading to overfitting.

  • seasonality_mode (str) – Either ‘additive’ or ‘multiplicative’.

  • daily_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • weekly_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • yearly_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • add_seasonality (Optional[List]) – Manually add one or more seasonality components. Pass a list of dicts containing the keys name, period, fourier_order (obligatory), prior_scale and mode (optional).

  • seasonality_prior_scale (float) – Parameter controlling the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, potentially leading to overfitting.

  • uncertainty_samples (int) – Number of simulated draws used to estimate uncertainty intervals.

  • mcmc_samples (int) – If >0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.

fit(df)[source]

Fit Prophet model on normal (inlier) data.

Parameters:

df (DataFrame) – Dataframe with columns ds with timestamps and y with target values.

Return type:

None

predict(df, return_instance_score=True, return_forecast=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • df (DataFrame) – DataFrame with columns ds with timestamps and y with values which need to be flagged as outlier or not.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

  • return_forecast (bool) – Whether to return the model forecast.

Return type:

Dict[Dict[str, str], Dict[DataFrame, DataFrame]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions, instance level outlier scores and the model forecast.

score(df)[source]

Compute outlier scores.

Parameters:

df (DataFrame) – DataFrame with columns ds with timestamps and y with values which need to be flagged as outlier or not.

Return type:

DataFrame

Returns:

Array with outlier scores for each instance in the batch.

class alibi_detect.od.OutlierSeq2Seq(n_features, seq_len, threshold=None, seq2seq=None, threshold_net=None, latent_dim=None, output_activation=None, beta=1.0)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(n_features, seq_len, threshold=None, seq2seq=None, threshold_net=None, latent_dim=None, output_activation=None, beta=1.0)[source]

Seq2Seq-based outlier detector.

Parameters:
  • n_features (int) – Number of features in the time series.

  • seq_len (int) – Sequence length fed into the Seq2Seq model.

  • threshold (Union[float, ndarray, None]) – Threshold used for outlier detection. Can be a float or feature-wise array.

  • seq2seq (Optional[Model]) – A trained seq2seq model if available.

  • threshold_net (Optional[Model]) – Layers for the threshold estimation network wrapped in a tf.keras.Sequential class if no ‘seq2seq’ is specified.

  • latent_dim (Optional[int]) – Latent dimension of the encoder and decoder.

  • output_activation (Optional[str]) – Activation used in the Dense output layer of the decoder.

  • beta (float) – Weight on the threshold estimation loss term.

feature_score(X_orig, X_recon, threshold_est)[source]

Compute feature level outlier scores.

Parameters:
  • X_orig (ndarray) – Original time series.

  • X_recon (ndarray) – Reconstructed time series.

  • threshold_est (ndarray) – Estimated threshold from the decoder’s latent space.

Return type:

ndarray

Returns:

Feature level outlier scores. Scores above 0 are outliers.

fit(X, loss_fn=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train Seq2Seq model.

Parameters:
  • X – Univariate or multivariate time series. Shape equals (batch, features) or (batch, sequence length, features).

  • loss_fn – Loss function used for training.

  • optimizer – Optimizer used for training.

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, outlier_perc=100.0, threshold_perc=95.0, batch_size=10000000000)[source]

Update the outlier threshold by using a sequence of instances from the dataset of which the fraction of features which are outliers are known. This fraction can be across all features or per feature.

Parameters:
  • X (ndarray) – Univariate or multivariate time series.

  • outlier_perc (Union[int, float]) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • threshold_perc (Union[int, float, ndarray, list]) – Percentage of X considered to be normal based on the outlier score. Overall (float) or feature-wise (array or list).

  • batch_size (int) – Batch size used when making predictions with the seq2seq model.

Return type:

None

instance_score(fscore, outlier_perc=100.0)[source]

Compute instance level outlier scores. instance in this case means the data along the first axis of the original time series passed to the predictor.

Parameters:
  • fscore (ndarray) – Feature level outlier scores.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type:

ndarray

Returns:

Instance level outlier scores.

predict(X, outlier_type='instance', outlier_perc=100.0, batch_size=10000000000, return_feature_score=True, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Univariate or multivariate time series.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the seq2seq model.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and both feature and instance level outlier scores.

score(X, outlier_perc=100.0, batch_size=10000000000)[source]

Compute feature and instance level outlier scores.

Parameters:
  • X (ndarray) – Univariate or multivariate time series.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the seq2seq model.

Return type:

Tuple[ndarray, ndarray]

Returns:

Feature and instance level outlier scores.

class alibi_detect.od.OutlierVAE(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

VAE-based outlier detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • score_type (str) – Metric used for outlier scores. Either ‘mse’ (mean squared error) or ‘proba’ (reconstruction probabilities) supported.

  • vae (Optional[Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • latent_dim (Optional[int]) – Dimensionality of the latent space.

  • samples (int) – Number of samples sampled to evaluate each instance.

  • beta (float) – Beta parameter for KL-divergence loss term.

  • data_type (Optional[str]) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

feature_score(X_orig, X_recon)[source]

Compute feature level outlier scores.

Parameters:
  • X_orig (ndarray) – Batch of original instances.

  • X_recon (ndarray) – Batch of reconstructed instances.

Return type:

ndarray

Returns:

Feature level outlier scores.

fit(X, loss_fn=<function elbo>, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAE model.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • optimizer – Optimizer used for training.

  • cov_elbo – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, outlier_type='instance', outlier_perc=100.0, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

  • batch_size (int) – Batch size used when making predictions with the VAE.

Return type:

None

instance_score(fscore, outlier_perc=100.0)[source]

Compute instance level outlier scores.

Parameters:
  • fscore (ndarray) – Feature level outlier scores.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type:

ndarray

Returns:

Instance level outlier scores.

predict(X, outlier_type='instance', outlier_perc=100.0, batch_size=10000000000, return_feature_score=True, return_instance_score=True)[source]

Predict whether instances are outliers or not.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the VAE.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and both feature and instance level outlier scores.

score(X, outlier_perc=100.0, batch_size=10000000000)[source]

Compute feature and instance level outlier scores.

Parameters:
  • X (ndarray) – Batch of instances.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • batch_size (int) – Batch size used when making predictions with the VAE.

Return type:

Tuple[ndarray, ndarray]

Returns:

Feature and instance level outlier scores.

class alibi_detect.od.OutlierVAEGMM(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

Bases: BaseDetector, FitMixin, ThresholdMixin

__init__(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

VAEGMM-based outlier detector.

Parameters:
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • vaegmm (Optional[Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[Model]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.

  • decoder_net (Optional[Model]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.

  • gmm_density_net (Optional[Model]) – Layers for the GMM network wrapped in a tf.keras.Sequential class.

  • n_gmm (Optional[int]) – Number of components in GMM.

  • latent_dim (Optional[int]) – Dimensionality of the latent space.

  • samples (int) – Number of samples sampled to evaluate each instance.

  • beta (float) – Beta parameter for KL-divergence loss term.

  • recon_features (Callable) – Function to extract features from the reconstructed instance by the decoder.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

fit(X, loss_fn=<function loss_vaegmm>, w_recon=1e-07, w_energy=0.1, w_cov_diag=0.005, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAEGMM model.

Parameters:
  • X – Training batch.

  • loss_fn – Loss function used for training.

  • w_recon – Weight on elbo loss term if default loss_vaegmm.

  • w_energy – Weight on sample energy loss term if default loss_vaegmm loss fn is used.

  • w_cov_diag – Weight on covariance regularizing loss term if default loss_vaegmm loss fn is used.

  • optimizer – Optimizer used for training.

  • cov_elbo – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).

  • epochs – Number of training epochs.

  • batch_size – Batch size used for training.

  • verbose – Whether to print training progress.

  • log_metric – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks – Callbacks used during training.

infer_threshold(X, threshold_perc=95.0, batch_size=10000000000)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

  • batch_size (int) – Batch size used when making predictions with the VAEGMM.

Return type:

None

predict(X, batch_size=10000000000, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Batch of instances.

  • batch_size (int) – Batch size used when making predictions with the VAEGMM.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing 'meta' and 'data' dictionaries. –

  • 'meta' has the model’s metadata.

  • 'data' contains the outlier predictions and instance level outlier scores.

score(X, batch_size=10000000000)[source]

Compute outlier scores.

Parameters:
  • X (ndarray) – Batch of instances to analyze.

  • batch_size (int) – Batch size used when making predictions with the VAEGMM.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.

class alibi_detect.od.SpectralResidual(threshold=None, window_amp=None, window_local=None, padding_amp_method='reflect', padding_local_method='reflect', padding_amp_side='bilateral', n_est_points=None, n_grad_points=5)[source]

Bases: BaseDetector, ThresholdMixin

__init__(threshold=None, window_amp=None, window_local=None, padding_amp_method='reflect', padding_local_method='reflect', padding_amp_side='bilateral', n_est_points=None, n_grad_points=5)[source]

Outlier detector for time-series data using the spectral residual algorithm. Based on “Time-Series Anomaly Detection Service at Microsoft” (Ren et al., 2019) https://arxiv.org/abs/1906.03821.

Parameters:
  • threshold (Optional[float]) – Threshold used to classify outliers. Relative saliency map distance from the moving average.

  • window_amp (Optional[int]) – Window for the average log amplitude.

  • window_local (Optional[int]) – Window for the local average of the saliency map. Note that the averaging is performed over the previous window_local data points (i.e., is a local average of the preceding window_local points for the current index).

  • padding_amp_method (Literal[‘constant’, ‘replicate’, ‘reflect’]) –

    Padding method to be used prior to each convolution over log amplitude. Possible values: constant | replicate | reflect. Default value: replicate.

    • constant - padding with constant 0.

    • replicate - repeats the last/extreme value.

    • reflect - reflects the time series.

  • padding_local_method (Literal[‘constant’, ‘replicate’, ‘reflect’]) –

    Padding method to be used prior to each convolution over saliency map. Possible values: constant | replicate | reflect. Default value: replicate.

    • constant - padding with constant 0.

    • replicate - repeats the last/extreme value.

    • reflect - reflects the time series.

  • padding_amp_side (Literal[‘bilateral’, ‘left’, ‘right’]) – Whether to pad the amplitudes on both sides or only on one side. Possible values: bilateral | left | right.

  • n_est_points (Optional[int]) – Number of estimated points padded to the end of the sequence.

  • n_grad_points (int) – Number of points used for the gradient estimation of the additional points padded to the end of the sequence.

add_est_points(X, t)[source]

Pad the time series with additional points since the method works better if the anomaly point is towards the center of the sliding window.

Parameters:
  • X (ndarray) – Uniformly sampled time series instances.

  • t (ndarray) – Equidistant timestamps corresponding to each input instances (i.e, the array should contain numerical values in increasing order).

Return type:

ndarray

Returns:

Padded version of X.

compute_grads(X, t)[source]

Slope of the straight line between different points of the time series multiplied by the average time step size.

Parameters:
  • X (ndarray) – Uniformly sampled time series instances.

  • t (ndarray) – Equidistant timestamps corresponding to each input instances (i.e, the array should contain numerical values in increasing order).

Return type:

ndarray

Returns:

Array with slope values.

infer_threshold(X, t=None, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters:
  • X (ndarray) – Uniformly sampled time series instances.

  • t (Optional[ndarray]) – Equidistant timestamps corresponding to each input instances (i.e, the array should contain numerical values in increasing order). If not provided, the timestamps will be replaced by an array of integers [0, 1, … , N - 1], where N is the size of the input time series.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type:

None

static pad_same(X, W, method='replicate', side='bilateral')[source]

Adds padding to the time series X such that after applying a valid convolution with a kernel/filter w, the resulting time series has the same shape as the input X.

Parameters:
  • X (ndarray) – Time series to be padded

  • W (ndarray) – Convolution kernel/filter.

  • method (str) –

    Padding method to be used. Possible values:

    • constant - padding with constant 0.

    • replicate - repeats the last/extreme value.

    • reflect - reflects the time series.

  • side (str) –

    Whether to pad the time series bilateral or only on one side. Possible values:

    • bilateral - time series is padded on both sides.

    • left - time series is padded only on the left hand side.

    • right - time series is padded only on the right hand side.

Return type:

ndarray

Returns:

Padded time series.

predict(X, t=None, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters:
  • X (ndarray) – Uniformly sampled time series instances.

  • t (Optional[ndarray]) – Equidistant timestamps corresponding to each input instances (i.e, the array should contain numerical values in increasing order). If not provided, the timestamps will be replaced by an array of integers [0, 1, … , N - 1], where N is the size of the input time series.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type:

Dict[Dict[str, str], Dict[ndarray, ndarray]]

Returns:

Dictionary containing meta and data dictionaries. –

  • meta - has the model’s metadata.

  • data - contains the outlier predictions and instance level outlier scores.

saliency_map(X)[source]

Compute saliency map.

Parameters:

X (ndarray) – Uniformly sampled time series instances.

Return type:

ndarray

Returns:

Array with saliency map values.

score(X, t=None)[source]

Compute outlier scores.

Parameters:
  • X (ndarray) – Uniformly sampled time series instances.

  • t (Optional[ndarray]) – Equidistant timestamps corresponding to each input instances (i.e, the array should contain numerical values in increasing order). If not provided, the timestamps will be replaced by an array of integers [0, 1, … , N - 1], where N is the size of the input time series.

Return type:

ndarray

Returns:

Array with outlier scores for each instance in the batch.

Subpackages

Submodules