This page was generated from notebooks/scale.ipynb.
Scaling Examples¶
Prerequisites¶
A kubernetes cluster with kubectl configured
curl
grpcurl
pygmentize
Setup Seldon Core¶
Use the setup notebook to Setup Cluster to setup Seldon Core with an ingress - either Ambassador or Istio.
Then port-forward to that ingress on localhost:8003 in a separate terminal either with:
Ambassador:
kubectl port-forward $(kubectl get pods -n seldon -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8003:8080
Istio:
kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080
[ ]:
!kubectl create namespace seldon
[ ]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
Replica Settings¶
A deployment that illustrate the settings for
.spec.replicas
.spec.predictors[].replicas
.spec.predictors[].componentSpecs[].replicas
Below you can see a configuration file that outlines these spec components mentioned (and different replicas):
[3]:
%%writefile resources/model_replicas.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: test-replicas
spec:
replicas: 1
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier2
replicas: 3
graph:
endpoint:
type: REST
name: classifier
type: MODEL
children:
- name: classifier2
type: MODEL
endpoint:
type: REST
name: example
replicas: 2
traffic: 50
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier3
graph:
children: []
endpoint:
type: REST
name: classifier3
type: MODEL
name: example2
traffic: 50
Overwriting resources/model_replicas.yaml
[ ]:
!kubectl create -f resources/model_replicas.yaml
We can now wait until each of the models are fully deployed
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=test-replicas -o jsonpath='{.items[0].metadata.name}')
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=test-replicas -o jsonpath='{.items[1].metadata.name}')
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=test-replicas -o jsonpath='{.items[2].metadata.name}')
Check each container is running in a deployment with correct number of replicas
[ ]:
classifierReplicas = !kubectl get deploy test-replicas-example-0-classifier -o jsonpath='{.status.replicas}'
classifierReplicas = int(classifierReplicas[0])
assert classifierReplicas == 2
[ ]:
classifier2Replicas = !kubectl get deploy test-replicas-example-1-classifier2 -o jsonpath='{.status.replicas}'
classifier2Replicas = int(classifier2Replicas[0])
assert classifier2Replicas == 3
[ ]:
classifier3Replicas = !kubectl get deploy test-replicas-example2-0-classifier3 -o jsonpath='{.status.replicas}'
classifier3Replicas = int(classifier3Replicas[0])
assert classifier3Replicas == 1
We can now just send a simple request
[ ]:
!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0]]}}' \
-X POST http://localhost:8003/seldon/seldon/test-replicas/api/v1.0/predictions \
-H "Content-Type: application/json"
[ ]:
!kubectl delete -f resources/model_replicas.yaml
Scale SeldonDeployment¶
Now we can actually scale the seldon deployment and see how it actually scales.
First we want to deploy a simple model with a single replica:
[5]:
%%writefile resources/model_scale.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-scale
spec:
replicas: 1
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
Overwriting resources/model_scale.yaml
[ ]:
!kubectl create -f resources/model_scale.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-scale -o jsonpath='{.items[0].metadata.name}')
We can actually confirm that there is only 1 replica currently running
[ ]:
replicas = !kubectl get deploy seldon-scale-example-0-classifier -o jsonpath='{.status.replicas}'
replicas = int(replicas[0])
assert replicas == 1
And then we can actually see how the model can be scaled up
[ ]:
!kubectl scale --replicas=2 sdep/seldon-scale
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-scale -o jsonpath='{.items[0].metadata.name}')
And now we can verify that there are actually two replicas instead of 1
[ ]:
replicas = !kubectl get deploy seldon-scale-example-0-classifier -o jsonpath='{.status.replicas}'
replicas = int(replicas[0])
assert replicas == 2
And now when we send requests to the model, these get directed to the respective replica.
[ ]:
!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0]]}}' \
-X POST http://localhost:8003/seldon/seldon/seldon-scale/api/v1.0/predictions \
-H "Content-Type: application/json"
[ ]:
!kubectl delete -f resources/model_scale.yaml
[ ]: