Autoscaling Seldon Deployments


  • The cluster should have heapster and metric-server running in the kube-system namespace

  • For Kind install ../../testing/scripts/metrics.yaml See

  • For Minikube run:

    minikube addons enable metrics-server
    minikube addons enable heapster

Setup Seldon Core

Use the setup notebook to Setup Cluster with Ambassador Ingress and Install Seldon Core. Instructions also online.

[ ]:
!kubectl create namespace seldon
[ ]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Create model with autoscaler

To create a model with an HorizontalPodAutoscaler there are three steps:

  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory, e.g.:

    cpu: '0.5'
  1. Add an HPA Spec refering to this Deployment, e.g.:

- hpaSpec:
    maxReplicas: 3
    - resource:
        name: cpu
        targetAverageUtilization: 10
      type: Resource
    minReplicas: 1

The full SeldonDeployment spec is shown below.

[ ]:
!pygmentize model_with_hpa.yaml
[ ]:
!kubectl create -f model_with_hpa.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-model -o jsonpath='{.items[0]}')

Create Load

We label some nodes for the loadtester. We attempt the first two as for Kind the first node shown will be the master.

[ ]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0]}') role=locust
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[1]}') role=locust
[ ]:
!helm install loadtester ../../../helm-charts/seldon-core-loadtesting  \
    --set \
    --set oauth.enabled=false \
    --set locust.hatchRate=1 \
    --set locust.clients=1 \
    --set loadtest.sendFeedback=0 \
    --set locust.minWait=0 \
    --set locust.maxWait=0 \
    --set replicaCount=1

After a few mins you should see the deployment my-dep scaled to 3 deployments

[ ]:
import json
import time

def getNumberPods():
    dp=!kubectl get deployment seldon-model-example-0-classifier -o json
    return dp["status"]["replicas"]

scaled = False
for i in range(60):
    pods = getNumberPods()
    if pods > 1:
        scaled = True
[ ]:
!kubectl get pods,deployments,hpa

Remove Load

After 5-10 mins you should see the deployments replicas decrease to 1

[ ]:
!helm delete loadtester -n seldon
[ ]:
!kubectl get pods,deployments,hpa
[ ]:
!kubectl delete -f model_with_hpa.yaml
[ ]: