Guide to Benchmark Prediction Services

This guide will go through detailed steps to show how a Seldon prediction service can be benchmarked and will compare REST and gRPC variants. The service benchmarked will be the MNIST digit classifider demo. We will use the locust load testing framework which is easier to extend to create gRPC tests than Iago.

The steps taken will be

Create Kubernetes cluster on AWS

For our test we used AWS but other cloud providers can be used. You will need persistent storage to run Seldon. For this test we used GlusterFS.

Launch Seldon

   DATA_VOLUME="glusterfs": {"endpoints": "glusterfs-cluster","path": "gv0","readOnly": false}
SELDON_SERVICE_TYPE=LoadBalancer
GLUSTERFS_IP1=192.168.0.149
GLUSTERFS_IP2=192.168.0.248

Create the conf with make clean conf

   kubectl label nodes ip-172-20-0-71.eu-west-1.compute.internal role=locust
   kubectl cordon ip-172-20-0-71.eu-west-1.compute.internal
   SELDON_WITH_GLUSTERFS=true SELDON_WITH_SPARK=false seldon-up

Download pretrained MNIST model

We follow the steps outlined in the MNIST digit classifier demo to setup a client and download a model for that client.

seldon-cli client --action setup --db-name ClientDB --client-name deep_mnist_client
cd kubernetes/conf/examples/tensorflow_deep_mnist
kubectl create -f load-model-tensorflow-deep-mnist.json

Wait until the model has finished downloading by checking the job with kubetctl get jobs

Start a REST MNIST prediction microservice

start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rest 1.0

Run REST load test

   kubectl uncordon ip-172-20-0-71.eu-west-1.compute.internal
  launch-locust-load-test --seldon-client deep_mnist_client --test-type js-predict

REST Results

locust REST results

Clean up the load test resources

   cd kubernetes/conf/microservices
kubectl delete -f microservice-tensorflow-deep-mnist.json
   cd kubernetes/conf/dev
kubectl delete -f locust-master.json
kubectl delete -f locust-slave.json
   kubectl cordon ip-172-20-0-71.eu-west-1.compute.internal

Start the gRPC MNIST prediction microservice

start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rpc 1.0

Run gRPC load test

   kubectl uncordon ip-172-20-0-71.eu-west-1.compute.internal
  launch-locust-load-test --seldon-client deep_mnist_client --test-type grpc-predict

gRPC Results

locust gRPC results

Discussion

On the average response time for REST is over 50% slower than for gRPC.

The percentiles from the two tests are as follows.

"Name",			"# requests",	"50%",	"66%",	"75%",	"80%",	"90%",	"95%",	"98%",	"99%",	"100%"
"grpc http://seldon-server:80" ,43521 14 16 18 19 25 36 67 110 5045
"GET /js/predict" ,42462 20 25 30 36 63 110 190 240 717

The advantage of gRPC increases as the percentiles increase except for the 100% percentile which suggests there was some delay at the start or outlier that needs further investigation.

The locust tests were done with 50 clients, each waiting between 900 and 1000ms between calls. Locust unlike Iago does not allow you to create a fixed request rate so some delays in the load testing framework itself between REST and gRPC implementations could effect the response times. However, we found it less easy to develop a gRPC test in Iago.