Guide to Benchmark Prediction Services
This guide will go through detailed steps to show how a Seldon prediction service can be benchmarked and will compare REST and gRPC variants. The service benchmarked will be the MNIST digit classifider demo. We will use the locust load testing framework which is easier to extend to create gRPC tests than Iago.
The steps taken will be
- Launch a 3 node cluster on AWS
- Create Seldon configuration and Launch onto cluster
- For REST and gRPC variants:
- Start the MNIST demo (in REST or gRPC variant)
- Launch a locust load test
Create Kubernetes cluster on AWS
For our test we used AWS but other cloud providers can be used. You will need persistent storage to run Seldon. For this test we used GlusterFS.
- Create a Kubernetes cluster on AWS.
- We created one with 3 m3.large minions and a m3.large master.
- Create a GlusterFS cluster in an AWS VPC.
- Update Seldon Kubernetes configuration
seldon-server/kubernetes/conf/Makefile, setting the glusterfs ip addresses as appropriate for your glusterfs configuration. Also, change the SELDON_SERVICE_TYPE to LoadBalancer. For example:
Create the conf with
make clean conf
- Label one node as a locust loadtest node. We will use this node to run the loadtester later. Your node name will differ.
- Cordon off the locust loadtest node so its not used for Seldon.
- Start Seldon, with glusterfs and without Spark (not needed for this test)
Download pretrained MNIST model
We follow the steps outlined in the MNIST digit classifier demo to setup a client and download a model for that client.
- Create a Seldon client for the predictions
- Download a pretrained deep neural network model for digit recognition
Wait until the model has finished downloading by checking the job with
kubetctl get jobs
Start a REST MNIST prediction microservice
Run REST load test
- Uncordon the locust node so we can start the locust loadtest services in it.
- Launch a REST based load test sending random data to the server
Clean up the load test resources
- Stop the REST microservice. A JSON kubernetes config would have been created for the deployment and this can be used to clean up the resources:
- Stop the locust load test pods. JSON files would have been created which can be used to clean up the resources:
- Recordon the load test node
Start the gRPC MNIST prediction microservice
Run gRPC load test
- Uncordon the locust node so we can start the locust loadtest service in it.
- Launch a gRPC based load test sending random data to the server
On the average response time for REST is over 50% slower than for gRPC.
The percentiles from the two tests are as follows.
The advantage of gRPC increases as the percentiles increase except for the 100% percentile which suggests there was some delay at the start or outlier that needs further investigation.
The locust tests were done with 50 clients, each waiting between 900 and 1000ms between calls. Locust unlike Iago does not allow you to create a fixed request rate so some delays in the load testing framework itself between REST and gRPC implementations could effect the response times. However, we found it less easy to develop a gRPC test in Iago.