INNe anomaly detection Demo
This example will take you through creating a microservice that detect anomalies using iNNE technique. In this example you will learn how to deploy the microservice from the prepackage docker image available in seldon-server.
If you are interested on theory behind iNNE technique, find out more about iNNE anomaly detection.
- You have installed Seldon on a Kubernetes cluster.
Train the model
The iNNE anomaly detector is trained on a dataset of generated samples. We have prepacked a docker image that generate the train dataset, fit the model and save the pipeline. To train the detector, we generate data by sampling 1000 random 4-dimensional vectors from 3 gaussian distributions centered at 2.0, 4.0 and 6.0 and with standard deviation of 0.1. Moreover we include 5 anomaly points by sampling 5 random 4-dimensional vectors from a gaussian centered at 0.0 and with standard deviation 0.1. This procedure results on a train dataset of 3005 samples.
At runtime Seldon requires you expose your model scoring engine as a microservice API. In this example the same image creating and training the model also exposes it for runtime scoring when run. We can start the microservice using the command line script start-microservice
The script creates a Kubernetes deployment for the microservice in kubernetes/conf/microservices. If the microservice is already running Kubernetes will roll-down the previous version and roll-up the new version.
To start the iNNE detector microservice on the client “test” (created by seldon-up on startup):
Serve anomaly detection
To obtain the anomaly score for any 4-dimensional vector
The response should be
Since the anomaly detector uses the seldon prediction format for the response, the above data should be interpreted as follow:
- if “PredictedClass” : “Anomaly_score”, the keys “prediction” and “confidence” store the anomaly score for the vector, where 0 is not anomalous and 1 is the maximally anomalous
- if “PredictedClass” : “Complementary_score”, the keys “prediction” and “confidence” store the non-anomaly score, which is 1 - anomaly_score
Since many samples (about one third) in the dataset are drawn from a gaussian centered in 2.0, the anomaly score for the sample is low. A vector such as
should get an high anomaly score