alibi_detect.datasets module

alibi_detect.datasets.fetch_kdd(target=['dos', 'r2l', 'u2r', 'probe'], keep_cols=['srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate'], percent10=True, return_X_y=False)[source]

KDD Cup ‘99 dataset. Detect computer network intrusions.

Parameters
  • target (list) – List with attack types to detect.

  • keep_cols (list) – List with columns to keep. Defaults to continuous features.

  • percent10 (bool) – Bool, whether to only return 10% of the data.

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[numpy.ndarray, numpy.ndarray]]

Returns

  • Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’).

  • (data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_nab(ts, return_X_y=False)[source]

Get time series in a DataFrame from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Parameters
  • ts (str) –

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[pandas.DataFrame, pandas.DataFrame]]

Returns

  • Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’) in DataFrames with timestamps.

  • (data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.get_list_nab()[source]

Get list of possible time series to retrieve from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Return type

list

Returns

List with time series names.