Guide to Benchmark Prediction Services

This guide will go through detailed steps to show how a Seldon prediction service can be benchmarked and will compare REST and gRPC variants. The service benchmarked will be the MNIST digit classifider demo. We will use the locust load testing framework which is easier to extend to create gRPC tests than Iago.

The steps taken will be

Create Kubernetes cluster on AWS

For our test we used AWS but other cloud providers can be used. You will need persistent storage to run Seldon. For this test we used GlusterFS.

Launch Seldon

   DATA_VOLUME="glusterfs": {"endpoints": "glusterfs-cluster","path": "gv0","readOnly": false}
   SELDON_SERVICE_TYPE=LoadBalancer
   GLUSTERFS_IP1=192.168.0.149
   GLUSTERFS_IP2=192.168.0.248

Create the conf with make clean conf

   kubectl label nodes ip-172-20-0-71.eu-west-1.compute.internal role=locust
   kubectl cordon ip-172-20-0-71.eu-west-1.compute.internal
   SELDON_WITH_GLUSTERFS=true SELDON_WITH_SPARK=false seldon-up

Download pretrained MNIST model

We follow the steps outlined in the MNIST digit classifier demo to setup a client and download a model for that client.

seldon-cli client --action setup --db-name ClientDB --client-name deep_mnist_client
cd kubernetes/conf/examples/tensorflow_deep_mnist
kubectl create -f load-model-tensorflow-deep-mnist.json

Wait until the model has finished downloading by checking the job with kubetctl get jobs

Start a REST MNIST prediction microservice

start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rest 1.0

Run REST load test

   kubectl uncordon ip-172-20-0-71.eu-west-1.compute.internal
  launch-locust-load-test --seldon-client deep_mnist_client --test-type js-predict

REST Results

locust REST results

Clean up the load test resources

   cd kubernetes/conf/microservices
   kubectl delete -f microservice-tensorflow-deep-mnist.json 
   cd kubernetes/conf/dev
   kubectl delete -f locust-master.json
   kubectl delete -f locust-slave.json
   kubectl cordon ip-172-20-0-71.eu-west-1.compute.internal

Start the gRPC MNIST prediction microservice

start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rpc 1.0

Run gRPC load test

   kubectl uncordon ip-172-20-0-71.eu-west-1.compute.internal
  launch-locust-load-test --seldon-client deep_mnist_client --test-type grpc-predict

gRPC Results

locust gRPC results

Discussion

On the average response time for REST is over 50% slower than for gRPC.

The percentiles from the two tests are as follows.

"Name",			"# requests",	"50%",	"66%",	"75%",	"80%",	"90%",	"95%",	"98%",	"99%",	"100%"
"grpc http://seldon-server:80" ,43521	14	16	18	19	25	36	67	110	5045
"GET /js/predict"	       ,42462	20	25	30	36	63	110	190	240	717

The advantage of gRPC increases as the percentiles increase except for the 100% percentile which suggests there was some delay at the start or outlier that needs further investigation.

The locust tests were done with 50 clients, each waiting between 900 and 1000ms between calls. Locust unlike Iago does not allow you to create a fixed request rate so some delays in the load testing framework itself between REST and gRPC implementations could effect the response times. However, we found it less easy to develop a gRPC test in Iago.