Seldon Recommendation Benchmarking Guide

This guide will go through detailed steps to show how a Seldon recommendation calls can be benchmarked. In this case we will:

Create Kubernetes cluster on AWS

For our test we used AWS but other cloud providers can be used. You will need persistent storage to run Seldon. For this test we used GlusterFS.

Launch Seldon

   DATA_VOLUME="glusterfs": {"endpoints": "glusterfs-cluster","path": "gv0","readOnly": false}
 "replicas": 2,

Create the conf with make clean conf

   kubectl label nodes role=iago
   kubectl cordon
   SELDON_WITH_GLUSTERFS=true seldon-up

Create item-similarity model for Movielens 10 Million

Run the Movielens 10 Million training and check the API is providing recommendations.

Run Iago Load Test

   cd seldon-server/docker/iago
   ./ ml10m example.ml10m.replay.txt 10000 50000
   ssh <kubernetes master>
   apt-get install glusterfs-client
   mkdir /mnt/glusterfs
   mount.glusterfs /mnt/glusterfs

Then sftp the replay file to /mnt/glusterfs/loadtest on the master node so it can be found by iago.

   kubectl uncordon
   cd seldon-server/kubernetes/conf/dev
   kubectl create -f iago.json
   kubectl exec -ti `kubectl get pods | grep iago | cut -d' ' -f1` -- /bin/bash
  ./ /seldon-data/loadtest/example.ml10m.replay.txt 100


You can view iago stats on its UI. A loadbalancer would have been created. The url for latency is of the form below, replace with your loadbalancer external DNS name:


After some peaks at the start the 95% percentile settles down to around 100ms.

loadtest 95% percentile

Looking at the 99% percentile shows there are some outliers which would suggest further optimization and increased infrastructure size would be necessary to further decrease these spikes.

loadtest 99% percentile

You can also view analytics on the Seldon Grafana dashboard which should be exposed as a LoadBalancer on AWS. You will need to find the hostname at which point you can go to the url:


An example display of the dashboard from during the loadtest is shown below:

Grafana ml10m dashboard

Further Optimization

For this benchmark item-similarity is mainly using the Mysql DB to get the similarities created from the Spark modelling job so further optimization should focus on Mysql read optimization and the front end servers cpu and memory. In general to get further decreases in latency and to handle higher loads the further optimizations shown below could be investigated: