Autoscaling Seldon Deployments

Prerequistes

You will need

  • Git clone of Seldon Core

  • A running Kubernetes cluster with kubectl authenticated

    • The cluster should have heapster and metric-server running in the kube-system namespace
    • For Minikube run:
    minikube addons enable metrics-server
    minikube addons enable heapster
    
  • seldon-core Python package (pip install seldon-core)

  • Helm client

Creating a Kubernetes Cluster

Follow the Kubernetes documentation to create a cluster.

Once created ensure kubectl is authenticated against the running cluster.

Setup

[1]:
!kubectl create namespace seldon
namespace/seldon created
[2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
Context "minikube" modified.
[3]:
!kubectl create clusterrolebinding kube-system-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
clusterrolebinding.rbac.authorization.k8s.io/kube-system-cluster-admin created

Install Helm

[4]:
!kubectl -n kube-system create sa tiller
!kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
!helm init --service-account tiller
serviceaccount/tiller created
clusterrolebinding.rbac.authorization.k8s.io/tiller created
$HELM_HOME has been configured at /home/clive/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
[5]:
!kubectl rollout status deploy/tiller-deploy -n kube-system
Waiting for deployment "tiller-deploy" rollout to finish: 0 of 1 updated replicas are available...
deployment "tiller-deploy" successfully rolled out

Start seldon-core

[13]:
!helm install ../../../helm-charts/seldon-core-operator --name seldon-core --set usageMetrics.enabled=true --namespace seldon-system
NAME:   seldon-core
LAST DEPLOYED: Sat May  4 08:13:14 2019
NAMESPACE: seldon-system
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                                   TYPE    DATA  AGE
seldon-operator-webhook-server-secret  Opaque  0     0s

==> v1beta1/CustomResourceDefinition
NAME                                         AGE
seldondeployments.machinelearning.seldon.io  0s

==> v1/ClusterRole
seldon-operator-manager-role  0s

==> v1/ClusterRoleBinding
NAME                                 AGE
seldon-operator-manager-rolebinding  0s

==> v1/Service
NAME                                        TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)  AGE
seldon-operator-controller-manager-service  ClusterIP  10.109.96.211  <none>       443/TCP  0s

==> v1/StatefulSet
NAME                                DESIRED  CURRENT  AGE
seldon-operator-controller-manager  1        1        0s

==> v1/Pod(related)
NAME                                  READY  STATUS             RESTARTS  AGE
seldon-operator-controller-manager-0  0/1    ContainerCreating  0         0s


NOTES:
NOTES: TODO


[14]:
!kubectl rollout status statefulset.apps/seldon-operator-controller-manager -n seldon-system
Waiting for 1 pods to be ready...
partitioned roll out complete: 1 new pods have been updated...

Setup Ingress

There are gRPC issues with the latest Ambassador, so we rewcommend 0.40.2 until these are fixed.

[9]:
!helm install stable/ambassador --name ambassador --set image.tag=0.40.2
NAME:   ambassador
LAST DEPLOYED: Sat May  4 08:00:30 2019
NAMESPACE: seldon
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                         READY  STATUS             RESTARTS  AGE
ambassador-5b89d44544-hq529  0/1    ContainerCreating  0         0s
ambassador-5b89d44544-p2qb5  0/1    ContainerCreating  0         0s
ambassador-5b89d44544-tznhw  0/1    ContainerCreating  0         0s

==> v1/ServiceAccount
NAME        SECRETS  AGE
ambassador  1        0s

==> v1beta1/ClusterRole
NAME        AGE
ambassador  0s

==> v1beta1/ClusterRoleBinding
NAME        AGE
ambassador  0s

==> v1/Service
NAME               TYPE          CLUSTER-IP      EXTERNAL-IP  PORT(S)                     AGE
ambassador-admins  ClusterIP     10.108.187.165  <none>       8877/TCP                    0s
ambassador         LoadBalancer  10.111.42.13    <pending>    80:31994/TCP,443:30617/TCP  0s

==> v1/Deployment
NAME        DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
ambassador  3        3        3           0          0s


NOTES:
Congratuations! You've successfully installed Ambassador.

For help, visit our Slack at https://d6e.co/slack or view the documentation online at https://www.getambassador.io.

To get the IP address of Ambassador, run the following commands:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
     You can watch the status of by running 'kubectl get svc -w  --namespace seldon ambassador'

  On GKE/Azure:
  export SERVICE_IP=$(kubectl get svc --namespace seldon ambassador -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

  On AWS:
  export SERVICE_IP=$(kubectl get svc --namespace seldon ambassador -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

  echo http://$SERVICE_IP:

[10]:
!kubectl rollout status deployment.apps/ambassador
Waiting for deployment "ambassador" rollout to finish: 0 of 3 updated replicas are available...
Waiting for deployment "ambassador" rollout to finish: 1 of 3 updated replicas are available...
Waiting for deployment "ambassador" rollout to finish: 2 of 3 updated replicas are available...
deployment "ambassador" successfully rolled out

Create model with autoscaler

To create a model with an HorizontalPodAutoscaler there are three steps:

  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory, e.g.:
"resources": {
   "requests": {
      "cpu": "0.5"
   }
}
  1. Add an HPA Spec refering to this Deployment, e.g.:
"hpaSpec":
       {
       "minReplicas": 1,
       "maxReplicas": 3,
       "metrics":
           [ {
           "type": "Resource",
           "resource": {
               "name": "cpu",
               "targetAverageUtilization": 10
           }
           }]
       },

The full SeldonDeployment spec is shown below.

[24]:
!pygmentize model_with_hpa.json
{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "name": "seldon-model"
    },
    "spec": {
        "name": "test-deployment",
        "oauth_key": "oauth-key",
        "oauth_secret": "oauth-secret",
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "seldonio/mock_classifier:1.0",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier",
                                "resources": {
                                    "requests": {
                                        "cpu": "0.5"
                                    }
                                }
                            }
                        ],
                        "terminationGracePeriodSeconds": 1
                    },
                    "hpaSpec":
                    {
                        "minReplicas": 1,
                        "maxReplicas": 3,
                        "metrics":
                            [ {
                                "type": "Resource",
                                "resource": {
                                    "name": "cpu",
                                    "targetAverageUtilization": 10
                                }
                            }]
                    }
                }],
                "graph": {
                    "children": [],
                    "name": "classifier",
                    "endpoint": {
                        "type" : "REST"
                    },
                    "type": "MODEL"
                },
                "name": "example",
                "replicas": 1
            }
        ]
    }
}
[25]:
!kubectl create -f model_with_hpa.json
seldondeployment.machinelearning.seldon.io/seldon-model created

Create Load

[26]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') role=locust
error: 'role' already has a value (locust), and --overwrite is false
[27]:
!helm install ../../../helm-charts/seldon-core-loadtesting --name loadtest  \
    --set locust.host=http://test-deployment-seldon-model:8000 \
    --set oauth.enabled=false \
    --set oauth.key=oauth-key \
    --set oauth.secret=oauth-secret \
    --set locust.hatchRate=1 \
    --set locust.clients=1 \
    --set loadtest.sendFeedback=0 \
    --set locust.minWait=0 \
    --set locust.maxWait=0 \
    --set replicaCount=1
NAME:   loadtest
LAST DEPLOYED: Sat May  4 08:23:42 2019
NAMESPACE: seldon
STATUS: DEPLOYED

RESOURCES:
==> v1/ReplicationController
NAME             DESIRED  CURRENT  READY  AGE
locust-slave-1   1        1        0      0s
locust-master-1  1        1        0      0s

==> v1/Service
NAME             TYPE      CLUSTER-IP      EXTERNAL-IP  PORT(S)                                       AGE
locust-master-1  NodePort  10.107.126.164  <none>       5557:30336/TCP,5558:31261/TCP,8089:30826/TCP  0s

==> v1/Pod(related)
NAME                   READY  STATUS             RESTARTS  AGE
locust-slave-1-jvrvn   0/1    ContainerCreating  0         0s
locust-master-1-t992d  0/1    ContainerCreating  0         0s


After a few mins you should see the deployment my-dep scaled to 3 deployments

[21]:
!kubectl get pods,deployments,hpa
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/ambassador-5b89d44544-hq529                        1/1     Running   0          16m
pod/ambassador-5b89d44544-p2qb5                        1/1     Running   0          16m
pod/ambassador-5b89d44544-tznhw                        1/1     Running   0          16m
pod/locust-master-1-pk2fb                              1/1     Running   0          113s
pod/locust-slave-1-w6c99                               1/1     Running   0          113s
pod/test-deployment-example-7cd068f-78dfbf847d-656wt   2/2     Running   0          42s
pod/test-deployment-example-7cd068f-78dfbf847d-8z5cq   2/2     Running   0          42s
pod/test-deployment-example-7cd068f-78dfbf847d-ptpfb   2/2     Running   0          2m58s

NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/ambassador                        3/3     3            3           16m
deployment.extensions/test-deployment-example-7cd068f   3/3     3            3           2m58s

NAME                                                                  REFERENCE                                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/test-deployment-example-7cd068f   Deployment/test-deployment-example-7cd068f   21%/10%   1         4         3          2m58s

Remove Load

After 5-10 mins you should see the deployments replicas decrease to 1

[28]:
!helm delete loadtest --purge
release "loadtest" deleted
[29]:
!kubectl get pods,deployments,hpa
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/ambassador-5b89d44544-hq529                        1/1     Running   0          60m
pod/ambassador-5b89d44544-p2qb5                        1/1     Running   0          60m
pod/ambassador-5b89d44544-tznhw                        1/1     Running   0          60m
pod/test-deployment-example-7cd068f-67b959cb86-4zhh6   2/2     Running   0          38m

NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/ambassador                        3/3     3            3           60m
deployment.extensions/test-deployment-example-7cd068f   1/1     1            1           38m

NAME                                                                  REFERENCE                                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/test-deployment-example-7cd068f   Deployment/test-deployment-example-7cd068f   0%/10%    1         3         1          38m
[ ]: