Shadow Rollout with Seldon and Ambassador

This notebook shows how you can deploy “shadow” deployments to direct traffic not only to the main Seldon Deployment but also to a shadow deployment whose reponse will be dicarded. This allows you to test new models in a production setting and with production traffic and anlalyse how they perform before putting them live.

These are useful when you want to test a new model or higher latency inference piepline (e.g., with explanation components) with production traffic but without affecting the live deployment.

Prerequistes

You will need

Creating a Kubernetes Cluster

Follow the Kubernetes documentation to create a cluster.

Once created ensure kubectl is authenticated against the running cluster.

Setup

[2]:
!kubectl create namespace seldon
namespace/seldon created
[1]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
Context "minikube" modified.
[4]:
!kubectl create clusterrolebinding kube-system-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
clusterrolebinding.rbac.authorization.k8s.io/kube-system-cluster-admin created

Install Helm

[5]:
!kubectl -n kube-system create sa tiller
!kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
!helm init --service-account tiller
serviceaccount/tiller created
clusterrolebinding.rbac.authorization.k8s.io/tiller created
$HELM_HOME has been configured at /home/clive/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
[6]:
!kubectl rollout status deploy/tiller-deploy -n kube-system
Waiting for deployment "tiller-deploy" rollout to finish: 0 of 1 updated replicas are available...
deployment "tiller-deploy" successfully rolled out

Start seldon-core

[2]:
!helm install ../../../helm-charts/seldon-core-operator --name seldon-core --set usageMetrics.enabled=true --namespace seldon-system
NAME:   seldon-core
LAST DEPLOYED: Tue Apr 16 10:55:02 2019
NAMESPACE: seldon-system
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME                                        TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)  AGE
seldon-operator-controller-manager-service  ClusterIP  10.111.111.87  <none>       443/TCP  0s

==> v1/StatefulSet
NAME                                DESIRED  CURRENT  AGE
seldon-operator-controller-manager  1        1        0s

==> v1beta1/ClusterRole
NAME                        AGE
seldon-spartakus-volunteer  0s

==> v1/ConfigMap
NAME                     DATA  AGE
seldon-spartakus-config  3     1s

==> v1/ClusterRole
NAME                          AGE
seldon-operator-manager-role  1s

==> v1/ClusterRoleBinding
NAME                                 AGE
seldon-operator-manager-rolebinding  1s

==> v1/ServiceAccount
NAME                        SECRETS  AGE
seldon-spartakus-volunteer  1        0s

==> v1beta1/ClusterRoleBinding
NAME                        AGE
seldon-spartakus-volunteer  0s

==> v1/Pod(related)
NAME                                  READY  STATUS             RESTARTS  AGE
seldon-operator-controller-manager-0  0/1    ContainerCreating  0         0s

==> v1/Secret
NAME                                   TYPE    DATA  AGE
seldon-operator-webhook-server-secret  Opaque  0     1s

==> v1beta1/CustomResourceDefinition
NAME                                         AGE
seldondeployments.machinelearning.seldon.io  1s

==> v1beta1/Deployment
NAME                        DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
seldon-spartakus-volunteer  1        0        0           0          0s


NOTES:
NOTES: TODO


[3]:
!kubectl rollout status statefulset.apps/seldon-operator-controller-manager -n seldon-system
partitioned roll out complete: 1 new pods have been updated...

Install the Prometheus and Grafana example analytics

[4]:
!helm install ../../../helm-charts/seldon-core-analytics --name seldon-core-analytics --set grafana_prom_admin_password=password --set persistence.enabled=false  --namespace seldon
NAME:   seldon-core-analytics
LAST DEPLOYED: Tue Apr 16 10:55:09 2019
NAMESPACE: seldon
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                 TYPE    DATA  AGE
grafana-prom-secret  Opaque  1     1s

==> v1/ServiceAccount
NAME        SECRETS  AGE
prometheus  1        1s

==> v1beta1/ClusterRole
NAME        AGE
prometheus  1s

==> v1beta1/ClusterRoleBinding
NAME        AGE
prometheus  1s

==> v1/Job
NAME                            DESIRED  SUCCESSFUL  AGE
grafana-prom-import-dashboards  1        0           1s

==> v1/Pod(related)
NAME                                      READY  STATUS             RESTARTS  AGE
grafana-prom-import-dashboards-8rwbr      0/1    ContainerCreating  0         0s
alertmanager-deployment-7cd568f668-mn9xz  0/1    ContainerCreating  0         0s
grafana-prom-deployment-899b4dd7b-glkxl   0/1    ContainerCreating  0         0s
prometheus-node-exporter-cn2bt            0/1    Pending            0         0s
prometheus-deployment-7554c97586-6929p    0/1    Pending            0         0s

==> v1/ConfigMap
NAME                       DATA  AGE
alertmanager-server-conf   1     1s
grafana-import-dashboards  11    1s
prometheus-rules           0     1s
prometheus-server-conf     1     1s

==> v1beta1/Deployment
NAME                     DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
alertmanager-deployment  1        1        1           0          1s
grafana-prom-deployment  1        1        1           0          0s
prometheus-deployment    1        1        1           0          0s

==> v1/Service
NAME                      TYPE       CLUSTER-IP      EXTERNAL-IP  PORT(S)       AGE
alertmanager              ClusterIP  10.98.4.252     <none>       80/TCP        0s
grafana-prom              NodePort   10.101.59.44    <none>       80:32280/TCP  0s
prometheus-node-exporter  ClusterIP  None            <none>       9100/TCP      0s
prometheus-seldon         ClusterIP  10.108.131.166  <none>       80/TCP        0s

==> v1beta1/DaemonSet
NAME                      DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
prometheus-node-exporter  1        1        0      1           0          <none>         0s


NOTES:
NOTES: TODO


Setup Ingress

There are gRPC issues with the latest Ambassador, so we rewcommend 0.40.2 until these are fixed.

[6]:
!helm install stable/ambassador --name ambassador --set image.tag=0.40.2
NAME:   ambassador
LAST DEPLOYED: Tue Apr 16 10:55:37 2019
NAMESPACE: seldon
STATUS: DEPLOYED

RESOURCES:
==> v1/ServiceAccount
NAME        SECRETS  AGE
ambassador  1        1s

==> v1beta1/ClusterRole
NAME        AGE
ambassador  1s

==> v1beta1/ClusterRoleBinding
NAME        AGE
ambassador  1s

==> v1/Service
NAME               TYPE          CLUSTER-IP    EXTERNAL-IP  PORT(S)                     AGE
ambassador-admins  ClusterIP     10.111.3.211  <none>       8877/TCP                    1s
ambassador         LoadBalancer  10.111.26.70  <pending>    80:30093/TCP,443:30568/TCP  1s

==> v1/Deployment
NAME        DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
ambassador  3        3        3           0          1s

==> v1/Pod(related)
NAME                         READY  STATUS             RESTARTS  AGE
ambassador-5b89d44544-98n89  0/1    ContainerCreating  0         1s
ambassador-5b89d44544-k4trg  0/1    ContainerCreating  0         1s
ambassador-5b89d44544-wpqkv  0/1    ContainerCreating  0         1s


NOTES:
Congratuations! You've successfully installed Ambassador.

For help, visit our Slack at https://d6e.co/slack or view the documentation online at https://www.getambassador.io.

To get the IP address of Ambassador, run the following commands:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
     You can watch the status of by running 'kubectl get svc -w  --namespace seldon ambassador'

  On GKE/Azure:
  export SERVICE_IP=$(kubectl get svc --namespace seldon ambassador -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

  On AWS:
  export SERVICE_IP=$(kubectl get svc --namespace seldon ambassador -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

  echo http://$SERVICE_IP:

[7]:
!kubectl rollout status deployment.apps/ambassador
Waiting for deployment "ambassador" rollout to finish: 0 of 3 updated replicas are available...
Waiting for deployment "ambassador" rollout to finish: 1 of 3 updated replicas are available...
Waiting for deployment "ambassador" rollout to finish: 2 of 3 updated replicas are available...
deployment "ambassador" successfully rolled out

Set up Port Forwards

Ensure you port forward ambassador:

kubectl port-forward $(kubectl get pods -n seldon -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8003:8080

Ensure you port forward to Grafana

kubectl port-forward $(kubectl get pods -n seldon -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n seldon 3000:3000

Launch main model

We will create a very simple Seldon Deployment with a dummy model image seldonio/mock_classifier:1.0. This deployment is named example.

[17]:
!pygmentize model.json
{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "labels": {
            "app": "seldon"
        },
        "name": "example"
    },
    "spec": {
        "name": "production-model",
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "seldonio/mock_classifier:1.0",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier"
                            }
                        ],
                        "terminationGracePeriodSeconds": 1
                    }}
                                  ],
                "graph":
                {
                    "children": [],
                    "name": "classifier",
                    "type": "MODEL",
                    "endpoint": {
                        "type": "REST"
                    }},
                "name": "single",
                "replicas": 1
            }
        ]
    }
}
[8]:
!kubectl create -f model.json
seldondeployment.machinelearning.seldon.io/example created
[9]:
!kubectl rollout status deploy/production-model-single-7cd068f
Waiting for deployment "production-model-single-7cd068f" rollout to finish: 0 of 1 updated replicas are available...
deployment "production-model-single-7cd068f" successfully rolled out

Get predictions

[10]:
from seldon_core.seldon_client import SeldonClient
sc = SeldonClient(deployment_name="example",namespace="seldon")

REST Request

[11]:
r = sc.predict(gateway="ambassador",transport="rest")
print(r)
Success:True message:
Request:
data {
  tensor {
    shape: 1
    shape: 1
    values: 0.25103583044502875
  }
}

Response:
meta {
  puid: "j7mhk9clk2fbu8tdq3ka8m2274"
  requestPath {
    key: "classifier"
    value: "seldonio/mock_classifier:1.0"
  }
}
data {
  names: "proba"
  tensor {
    shape: 1
    shape: 1
    values: 0.06503212230125832
  }
}

gRPC Request

[12]:
r = sc.predict(gateway="ambassador",transport="grpc")
print(r)
Success:True message:
Request:
data {
  tensor {
    shape: 1
    shape: 1
    values: 0.6342191670710421
  }
}

Response:
meta {
  puid: "cb93sjuopd985fa8jde1bt023m"
  requestPath {
    key: "classifier"
    value: "seldonio/mock_classifier:1.0"
  }
}
data {
  names: "proba"
  tensor {
    shape: 1
    shape: 1
    values: 0.09258712191804268
  }
}

Launch Shadow

We will now create a new Seldon Deployment for our Shadow deployment with a new model seldonio/mock_classifier_rest:1.1. To make it a shadow of the original example deployment we add two annotations

"annotations": {
        "seldon.io/ambassador-service-name":"example",
        "seldon.io/ambassador-shadow":"true"
    },

The first says to use example as our service endpoint rather than the default which would be our deployment name - in this case example-shadow. This will ensure that this Ambassador setting will apply to the same prefix as the previous one. The second states we want to use Ambassador’s shadow functionality.

[13]:
!pygmentize shadow.json
{
    "apiVersion": "machinelearning.seldon.io/v1alpha2",
    "kind": "SeldonDeployment",
    "metadata": {
        "labels": {
            "app": "seldon"
        },
        "name": "example-shadow"
    },
    "spec": {
        "name": "shadow-model",
        "annotations": {
            "seldon.io/ambassador-service-name":"example",
            "seldon.io/ambassador-shadow":"true"
        },
        "predictors": [
            {
                "componentSpecs": [{
                    "spec": {
                        "containers": [
                            {
                                "image": "seldonio/mock_classifier_rest:1.1",
                                "imagePullPolicy": "IfNotPresent",
                                "name": "classifier"
                            }
                        ],
                        "terminationGracePeriodSeconds": 1
                    }}
                                  ],
                "graph":
                {
                    "children": [],
                    "name": "classifier",
                    "type": "MODEL",
                    "endpoint": {
                        "type": "REST"
                    }},
                "name": "single",
                "replicas": 1
            }
        ]
    }
}
[14]:
!kubectl create -f shadow.json
seldondeployment.machinelearning.seldon.io/example-shadow created
[15]:
!kubectl rollout status deploy/shadow-model-single-4c8805f
Waiting for deployment "shadow-model-single-4c8805f" rollout to finish: 0 of 1 updated replicas are available...
deployment "shadow-model-single-4c8805f" successfully rolled out

Let’s send a bunch of requests to the endpoint.

[16]:
for i in range(1000):
    r = sc.predict(gateway="ambassador",transport="rest")

Now view the analytics dashboard at http://localhost:3000 You should see a dashboard view like below showing the two models production and shadow both receiving requests.

shadow

[ ]: