alibi_detect.od package

class alibi_detect.od.OutlierAEGMM(threshold=None, aegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, recon_features=<function eucl_cosim_features>, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, aegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, recon_features=<function eucl_cosim_features>, data_type=None)[source]

AEGMM-based outlier detector.

Parameters
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • aegmm (Optional[tensorflow.keras.Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘aegmm’ is specified.

  • decoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘aegmm’ is specified.

  • gmm_density_net (Optional[tensorflow.keras.Sequential]) – Layers for the GMM network wrapped in a tf.keras.Sequential class.

  • n_gmm (Optional[int]) – Number of components in GMM.

  • recon_features (Callable) – Function to extract features from the reconstructed instance by the decoder.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

Return type

None

fit(X, loss_fn=<function loss_aegmm>, w_energy=0.1, w_cov_diag=0.005, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train AEGMM model.

Parameters
  • X (numpy.ndarray) – Training batch.

  • loss_fn (tensorflow.keras.losses) – Loss function used for training.

  • w_energy (float) – Weight on sample energy loss term if default loss_aegmm loss fn is used.

  • w_cov_diag (float) – Weight on covariance regularizing loss term if default loss_aegmm loss fn is used.

  • optimizer (tensorflow.keras.optimizers) – Optimizer used for training.

  • epochs (int) – Number of training epochs.

  • batch_size (int) – Batch size used for training.

  • verbose (bool) – Whether to print training progress.

  • log_metric (Optional[Tuple[str, tensorflow.keras.metrics]]) – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks (Optional[tensorflow.keras.callbacks]) – Callbacks used during training.

Return type

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.

Return type

numpy.ndarray

Returns

Array with outlier scores for each instance in the batch.

class alibi_detect.od.IForest(threshold=None, n_estimators=100, max_samples='auto', max_features=1.0, bootstrap=False, n_jobs=1, data_type='tabular')[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, n_estimators=100, max_samples='auto', max_features=1.0, bootstrap=False, n_jobs=1, data_type='tabular')[source]

Outlier detector for tabular data using isolation forests.

Parameters
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • n_estimators (int) – Number of base estimators in the ensemble.

  • max_samples (Union[str, int, float]) – Number of samples to draw from the training data to train each base estimator. If int, draw ‘max_samples’ samples. If float, draw ‘max_samples * number of features’ samples. If ‘auto’, max_samples = min(256, number of samples)

  • max_features (Union[int, float]) – Number of features to draw from the training data to train each base estimator. If int, draw ‘max_features’ features. If float, draw ‘max_features * number of features’ features.

  • bootstrap (bool) – Whether to fit individual trees on random subsets of the training data, sampled with replacement.

  • n_jobs (int) – Number of jobs to run in parallel for ‘fit’ and ‘predict’.

  • data_type (str) – Optionally specify the data type (tabular, image or time-series). Added to metadata.

Return type

None

fit(X, sample_weight=None)[source]

Fit isolation forest.

Parameters
  • X (numpy.ndarray) – Training batch.

  • sample_weight (Optional[numpy.ndarray]) – Sample weights.

Return type

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.

Return type

numpy.ndarray

Returns

Array with outlier scores for each instance in the batch.

class alibi_detect.od.Mahalanobis(threshold=None, n_components=3, std_clip=3, start_clip=100, max_n=None, cat_vars=None, ohe=False, data_type='tabular')[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, n_components=3, std_clip=3, start_clip=100, max_n=None, cat_vars=None, ohe=False, data_type='tabular')[source]

Outlier detector for tabular data using the Mahalanobis distance.

Parameters
  • threshold (Optional[float]) – Mahalanobis distance threshold used to classify outliers.

  • n_components (int) – Number of principal components used.

  • std_clip (int) – Feature-wise stdev used to clip the observations before updating the mean and cov.

  • start_clip (int) – Number of observations before clipping is applied.

  • max_n (Optional[int]) – Algorithm behaves as if it has seen at most max_n points.

  • cat_vars (Optional[dict]) – Dict with as keys the categorical columns and as values the number of categories per categorical variable.

  • ohe (bool) – Whether the categorical variables are one-hot encoded (OHE) or not. If not OHE, they are assumed to have ordinal encodings.

  • data_type (str) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

Return type

None

cat2num(X)[source]

Convert categorical variables to numerical values.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.

Return type

numpy.ndarray

Returns

Batch of instances where categorical variables are converted to numerical values.

fit(X, y=None, d_type='abdm', w=None, disc_perc=[25, 50, 75], standardize_cat_vars=True, feature_range=(-10000000000.0, 10000000000.0), smooth=1.0, center=True)[source]

If categorical variables are present, then transform those to numerical values. This step is not necessary in the absence of categorical variables.

Parameters
  • X (numpy.ndarray) – Batch of instances used to infer distances between categories from.

  • y (Optional[numpy.ndarray]) – Model class predictions or ground truth labels for X. Used for ‘mvdm’ and ‘abdm-mvdm’ pairwise distance metrics. Note that this is only compatible with classification problems. For regression problems, use the ‘abdm’ distance metric.

  • d_type (str) – Pairwise distance metric used for categorical variables. Currently, ‘abdm’, ‘mvdm’ and ‘abdm-mvdm’ are supported. ‘abdm’ infers context from the other variables while ‘mvdm’ uses the model predictions. ‘abdm-mvdm’ is a weighted combination of the two metrics.

  • w (Optional[float]) – Weight on ‘abdm’ (between 0. and 1.) distance if d_type equals ‘abdm-mvdm’.

  • disc_perc (list) – List with percentiles used in binning of numerical features used for the ‘abdm’ and ‘abdm-mvdm’ pairwise distance measures.

  • standardize_cat_vars (bool) – Standardize numerical values of categorical variables if True.

  • feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges.

  • smooth (float) – Smoothing exponent between 0 and 1 for the distances. Lower values of l will smooth the difference in distance metric between different features.

  • center (bool) – Whether to center the scaled distance measures. If False, the min distance for each feature except for the feature with the highest raw max distance will be the lower bound of the feature range, but the upper bound will be below the max feature range.

Return type

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.

Return type

numpy.ndarray

Returns

Array with outlier scores for each instance in the batch.

class alibi_detect.od.OutlierVAE(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, score_type='mse', vae=None, encoder_net=None, decoder_net=None, latent_dim=None, samples=10, beta=1.0, data_type=None)[source]

VAE-based outlier detector.

Parameters
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • score_type (str) – Metric used for outlier scores. Either ‘mse’ (mean squared error) or ‘proba’ (reconstruction probabilities) supported.

  • vae (Optional[tensorflow.keras.Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • decoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vae’ is specified.

  • latent_dim (Optional[int]) – Dimensionality of the latent space.

  • samples (int) – Number of samples sampled to evaluate each instance.

  • beta (float) – Beta parameter for KL-divergence loss term.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

Return type

None

feature_score(X_orig, X_recon)[source]

Compute feature level outlier scores.

Parameters
  • X_orig (numpy.ndarray) – Batch of original instances.

  • X_recon (numpy.ndarray) – Batch of reconstructed instances.

Return type

numpy.ndarray

Returns

Feature level outlier scores.

fit(X, loss_fn=<function elbo>, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAE model.

Parameters
  • X (numpy.ndarray) – Training batch.

  • loss_fn (tensorflow.keras.losses) – Loss function used for training.

  • optimizer (tensorflow.keras.optimizers) – Optimizer used for training.

  • cov_elbo (dict) – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).

  • epochs (int) – Number of training epochs.

  • batch_size (int) – Batch size used for training.

  • verbose (bool) – Whether to print training progress.

  • log_metric (Optional[Tuple[str, tensorflow.keras.metrics]]) – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks (Optional[tensorflow.keras.callbacks]) – Callbacks used during training.

Return type

None

infer_threshold(X, outlier_type='instance', outlier_perc=100.0, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

instance_score(fscore, outlier_perc=100.0)[source]

Compute instance level outlier scores.

Parameters
  • fscore (numpy.ndarray) – Feature level outlier scores.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type

numpy.ndarray

Returns

Instance level outlier scores.

predict(X, outlier_type='instance', outlier_perc=100.0, return_feature_score=True, return_instance_score=True)[source]

Predict whether instances are outliers or not.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • outlier_type (str) – Predict outliers at the ‘feature’ or ‘instance’ level.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

  • return_feature_score (bool) – Whether to return feature level outlier scores.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and both feature and instance level outlier scores.

score(X, outlier_perc=100.0)[source]

Compute feature and instance level outlier scores.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • outlier_perc (float) – Percentage of sorted feature level outlier scores used to predict instance level outlier.

Return type

Tuple[numpy.ndarray, numpy.ndarray]

Returns

Feature and instance level outlier scores.

class alibi_detect.od.OutlierVAEGMM(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, vaegmm=None, encoder_net=None, decoder_net=None, gmm_density_net=None, n_gmm=None, latent_dim=None, samples=10, beta=1.0, recon_features=<function eucl_cosim_features>, data_type=None)[source]

VAEGMM-based outlier detector.

Parameters
  • threshold (Optional[float]) – Threshold used for outlier score to determine outliers.

  • vaegmm (Optional[tensorflow.keras.Model]) – A trained tf.keras model if available.

  • encoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.

  • decoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘vaegmm’ is specified.

  • gmm_density_net (Optional[tensorflow.keras.Sequential]) – Layers for the GMM network wrapped in a tf.keras.Sequential class.

  • n_gmm (Optional[int]) – Number of components in GMM.

  • latent_dim (Optional[int]) – Dimensionality of the latent space.

  • samples (int) – Number of samples sampled to evaluate each instance.

  • beta (float) – Beta parameter for KL-divergence loss term.

  • recon_features (Callable) – Function to extract features from the reconstructed instance by the decoder.

  • data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

Return type

None

fit(X, loss_fn=<function loss_vaegmm>, w_recon=1e-07, w_energy=0.1, w_cov_diag=0.005, optimizer=tensorflow.keras.optimizers.Adam, cov_elbo={'sim': 0.05}, epochs=20, batch_size=64, verbose=True, log_metric=None, callbacks=None)[source]

Train VAEGMM model.

Parameters
  • X (numpy.ndarray) – Training batch.

  • loss_fn (tensorflow.keras.losses) – Loss function used for training.

  • w_recon (float) – Weight on elbo loss term if default loss_vaegmm.

  • w_energy (float) – Weight on sample energy loss term if default loss_vaegmm loss fn is used.

  • w_cov_diag (float) – Weight on covariance regularizing loss term if default loss_vaegmm loss fn is used.

  • optimizer (tensorflow.keras.optimizers) – Optimizer used for training.

  • cov_elbo (dict) – Dictionary with covariance matrix options in case the elbo loss function is used. Either use the full covariance matrix inferred from X (dict(cov_full=None)), only the variance (dict(cov_diag=None)) or a float representing the same standard deviation for each feature (e.g. dict(sim=.05)).

  • epochs (int) – Number of training epochs.

  • batch_size (int) – Batch size used for training.

  • verbose (bool) – Whether to print training progress.

  • log_metric (Optional[Tuple[str, tensorflow.keras.metrics]]) – Additional metrics whose progress will be displayed if verbose equals True.

  • callbacks (Optional[tensorflow.keras.callbacks]) – Callbacks used during training.

Return type

None

infer_threshold(X, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

predict(X, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and instance level outlier scores.

score(X)[source]

Compute outlier scores.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.

Return type

numpy.ndarray

Returns

Array with outlier scores for each instance in the batch.

class alibi_detect.od.OutlierProphet(threshold=0.8, growth='linear', cap=None, holidays=None, holidays_prior_scale=10.0, country_holidays=None, changepoint_prior_scale=0.05, changepoint_range=0.8, seasonality_mode='additive', daily_seasonality='auto', weekly_seasonality='auto', yearly_seasonality='auto', add_seasonality=None, seasonality_prior_scale=10.0, uncertainty_samples=1000, mcmc_samples=0)[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin

__init__(threshold=0.8, growth='linear', cap=None, holidays=None, holidays_prior_scale=10.0, country_holidays=None, changepoint_prior_scale=0.05, changepoint_range=0.8, seasonality_mode='additive', daily_seasonality='auto', weekly_seasonality='auto', yearly_seasonality='auto', add_seasonality=None, seasonality_prior_scale=10.0, uncertainty_samples=1000, mcmc_samples=0)[source]

Outlier detector for time series data using fbprophet. See https://facebook.github.io/prophet/ for more details.

Parameters
  • threshold (float) – Width of the uncertainty intervals of the forecast, used as outlier threshold. Equivalent to interval_width. If the instance lies outside of the uncertainty intervals, it is flagged as an outlier. If mcmc_samples equals 0, it is the uncertainty in the trend using the MAP estimate of the extrapolated model. If mcmc_samples >0, then uncertainty over all parameters is used.

  • growth (str) – ‘linear’ or ‘logistic’ to specify a linear or logistic trend.

  • cap (Optional[float]) – Growth cap in case growth equals ‘logistic’.

  • holidays (Optional[pandas.DataFrame]) – pandas DataFrame with columns holiday (string) and ds (dates) and optionally columns lower_window and upper_window which specify a range of days around the date to be included as holidays.

  • holidays_prior_scale (float) – Parameter controlling the strength of the holiday components model. Higher values imply a more flexible trend, more prone to more overfitting.

  • country_holidays (Optional[str]) – Include country-specific holidays via country abbreviations. The holidays for each country are provided by the holidays package in Python. A list of available countries and the country name to use is available on: https://github.com/dr-prodigy/python-holidays. Additionally, Prophet includes holidays for: Brazil (BR), Indonesia (ID), India (IN), Malaysia (MY), Vietnam (VN), Thailand (TH), Philippines (PH), Turkey (TU), Pakistan (PK), Bangladesh (BD), Egypt (EG), China (CN) and Russian (RU).

  • changepoint_prior_scale (float) – Parameter controlling the flexibility of the automatic changepoint selection. Large values will allow many changepoints, potentially leading to overfitting.

  • changepoint_range (float) – Proportion of history in which trend changepoints will be estimated. Higher values means more changepoints, potentially leading to overfitting.

  • seasonality_mode (str) – Either ‘additive’ or ‘multiplicative’.

  • daily_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • weekly_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • yearly_seasonality (Union[str, bool, int]) – Can be ‘auto’, True, False, or a number of Fourier terms to generate.

  • add_seasonality (Optional[List]) – Manually add one or more seasonality components. Pass a list of dicts containing the keys name, period, fourier_order (obligatory), prior_scale and mode (optional).

  • seasonality_prior_scale (float) – Parameter controlling the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, potentially leading to overfitting.

  • uncertainty_samples (int) – Number of simulated draws used to estimate uncertainty intervals.

  • mcmc_samples (int) – If >0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.

Return type

None

fit(df)[source]

Fit Prophet model on normal (inlier) data.

Parameters

df (pandas.DataFrame) – Dataframe with columns ds with timestamps and y with target values.

Return type

None

predict(df, return_instance_score=True, return_forecast=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • df (pandas.DataFrame) – DataFrame with columns ds with timestamps and y with values which need to be flagged as outlier or not.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

  • return_forecast (bool) – Whether to return the model forecast.

Return type

Dict[Dict[str, str], Dict[pandas.DataFrame, pandas.DataFrame]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions, instance level outlier scores and the model forecast.

score(df)[source]

Compute outlier scores.

Parameters

df (pandas.DataFrame) – DataFrame with columns ds with timestamps and y with values which need to be flagged as outlier or not.

Return type

pandas.DataFrame

Returns

Array with outlier scores for each instance in the batch.

class alibi_detect.od.SpectralResidual(threshold=None, window_amp=None, window_local=None, n_est_points=None, n_grad_points=5)[source]

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.ThresholdMixin

__init__(threshold=None, window_amp=None, window_local=None, n_est_points=None, n_grad_points=5)[source]

Outlier detector for time-series data using the spectral residual algorithm. Based on “Time-Series Anomaly Detection Service at Microsoft” (Ren et al., 2019) https://arxiv.org/abs/1906.03821

Parameters
  • threshold (Optional[float]) – Threshold used to classify outliers. Relative saliency map distance from the moving average.

  • window_amp (Optional[int]) – Window for the average log amplitude.

  • window_local (Optional[int]) – Window for the local average of the saliency map.

  • n_est_points (Optional[int]) – Number of estimated points padded to the end of the sequence.

  • n_grad_points (int) – Number of points used for the gradient estimation of the additional points padded to the end of the sequence.

Return type

None

add_est_points(X, t)[source]

Pad the time series with additional points since the method works better if the anomaly point is towards the center of the sliding window.

Parameters
  • X (numpy.ndarray) – Time series of instances.

  • t (numpy.ndarray) – Time steps.

Return type

numpy.ndarray

Returns

Padded version of X.

compute_grads(X, t)[source]

Slope of the straight line between different points of the time series multiplied by the average time step size.

Parameters
  • X (numpy.ndarray) – Time series of instances.

  • t (numpy.ndarray) – Time steps.

Return type

numpy.ndarray

Returns

Array with slope values.

infer_threshold(X, t=None, threshold_perc=95.0)[source]

Update threshold by a value inferred from the percentage of instances considered to be outliers in a sample of the dataset.

Parameters
  • X (numpy.ndarray) – Batch of instances.

  • threshold_perc (float) – Percentage of X considered to be normal based on the outlier score.

Return type

None

predict(X, t=None, return_instance_score=True)[source]

Compute outlier scores and transform into outlier predictions.

Parameters
  • X (numpy.ndarray) – Time series of instances.

  • t (Optional[numpy.ndarray]) – Time steps.

  • return_instance_score (bool) – Whether to return instance level outlier scores.

Return type

Dict[Dict[str, str], Dict[numpy.ndarray, numpy.ndarray]]

Returns

  • Dictionary containing ‘meta’ and ‘data’ dictionaries.

  • ’meta’ has the model’s metadata.

  • ’data’ contains the outlier predictions and instance level outlier scores.

saliency_map(X)[source]

Compute saliency map.

Parameters

X (numpy.ndarray) – Time series of instances.

Return type

numpy.ndarray

Returns

Array with saliency map values.

score(X, t=None)[source]

Compute outlier scores.

Parameters
  • X (numpy.ndarray) – Time series of instances.

  • t (Optional[numpy.ndarray]) – Time steps.

Return type

numpy.ndarray

Returns

Array with outlier scores for each instance in the batch.