Autoscaling Seldon Deployments

Prerequisites

  • The cluster should have heapster and metric-server running in the kube-system namespace

  • For Minikube run:

    minikube addons enable metrics-server
    minikube addons enable heapster
    

Setup Seldon Core

Use the setup notebook to Setup Cluster with Ambassador Ingress and Install Seldon Core. Instructions also online.

Create model with autoscaler

To create a model with an HorizontalPodAutoscaler there are three steps:

  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory, e.g.:
"resources": {
   "requests": {
      "cpu": "0.5"
   }
}
  1. Add an HPA Spec refering to this Deployment, e.g.:
"hpaSpec":
       {
       "minReplicas": 1,
       "maxReplicas": 3,
       "metrics":
           [ {
           "type": "Resource",
           "resource": {
               "name": "cpu",
               "targetAverageUtilization": 10
           }
           }]
       },

The full SeldonDeployment spec is shown below.

[10]:
!pygmentize model_with_hpa.json
{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "name": "seldon-model"
    },
    "spec": {
        "name": "test-deployment",
        "oauth_key": "oauth-key",
        "oauth_secret": "oauth-secret",
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "seldonio/mock_classifier:1.0",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier",
                                "resources": {
                                    "requests": {
                                        "cpu": "0.5"
                                    }
                                }
                            }
                        ],
                        "terminationGracePeriodSeconds": 1
                    },
                    "hpaSpec":
                    {
                        "minReplicas": 1,
                        "maxReplicas": 3,
                        "metrics":
                            [ {
                                "type": "Resource",
                                "resource": {
                                    "name": "cpu",
                                    "targetAverageUtilization": 10
                                }
                            }]
                    }
                }],
                "graph": {
                    "children": [],
                    "name": "classifier",
                    "endpoint": {
                        "type" : "REST"
                    },
                    "type": "MODEL"
                },
                "name": "example",
                "replicas": 1
            }
        ]
    }
}
[11]:
!kubectl create -f model_with_hpa.json
seldondeployment.machinelearning.seldon.io/seldon-model created

Create Load

[12]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') role=locust
node/gke-standard-cluster-1-default-pool-b1c35e14-rrbd labeled
[15]:
!helm install ../../../helm-charts/seldon-core-loadtesting --name loadtest  \
    --set locust.host=http://seldon-model-test-deployment-example:8000 \
    --set oauth.enabled=false \
    --set oauth.key=oauth-key \
    --set oauth.secret=oauth-secret \
    --set locust.hatchRate=1 \
    --set locust.clients=1 \
    --set loadtest.sendFeedback=0 \
    --set locust.minWait=0 \
    --set locust.maxWait=0 \
    --set replicaCount=1
NAME:   loadtest
LAST DEPLOYED: Thu Aug 29 13:17:11 2019
NAMESPACE: seldon
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                   READY  STATUS             RESTARTS  AGE
locust-master-1-znncw  0/1    ContainerCreating  0         0s
locust-slave-1-hnx8n   0/1    ContainerCreating  0         0s

==> v1/ReplicationController
NAME             DESIRED  CURRENT  READY  AGE
locust-master-1  1        1        0      0s
locust-slave-1   1        1        0      0s

==> v1/Service
NAME             TYPE      CLUSTER-IP   EXTERNAL-IP  PORT(S)                                       AGE
locust-master-1  NodePort  10.0.31.100  <none>       5557:32552/TCP,5558:32023/TCP,8089:32677/TCP  0s


After a few mins you should see the deployment my-dep scaled to 3 deployments

[16]:
!kubectl get pods,deployments,hpa
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/ambassador-684d6f8cd9-cfxwc                        1/1     Running   0          10m
pod/ambassador-684d6f8cd9-lxwcd                        1/1     Running   0          10m
pod/ambassador-684d6f8cd9-ncv8b                        1/1     Running   0          10m
pod/locust-master-1-znncw                              1/1     Running   0          3m13s
pod/locust-slave-1-hnx8n                               1/1     Running   0          3m13s
pod/test-deployment-example-7cd068f-6cc64774ff-dtqwv   2/2     Running   0          2m40s
pod/test-deployment-example-7cd068f-6cc64774ff-gdkb8   2/2     Running   0          8m57s
pod/test-deployment-example-7cd068f-6cc64774ff-l4mn5   2/2     Running   0          5m11s

NAME                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/ambassador                        3         3         3            3           10m
deployment.extensions/test-deployment-example-7cd068f   3         3         3            3           8m57s

NAME                                                                  REFERENCE                                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/test-deployment-example-7cd068f   Deployment/test-deployment-example-7cd068f   51%/10%   1         3         3          8m57s

Remove Load

After 5-10 mins you should see the deployments replicas decrease to 1

[17]:
!helm delete loadtest --purge
release "loadtest" deleted
[19]:
!kubectl get pods,deployments,hpa
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/ambassador-684d6f8cd9-cfxwc                        1/1     Running   0          16m
pod/ambassador-684d6f8cd9-lxwcd                        1/1     Running   0          16m
pod/ambassador-684d6f8cd9-ncv8b                        1/1     Running   0          16m
pod/test-deployment-example-7cd068f-6cc64774ff-gdkb8   2/2     Running   0          15m

NAME                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/ambassador                        3         3         3            3           16m
deployment.extensions/test-deployment-example-7cd068f   1         1         1            1           15m

NAME                                                                  REFERENCE                                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/test-deployment-example-7cd068f   Deployment/test-deployment-example-7cd068f   1%/10%    1         3         1          15m
[ ]: