This page was generated from notebooks/server_examples.ipynb.

Example Model Servers with Seldon

Setup Seldon Core

Use the setup notebook to Setup Cluster with Ambassador Ingress and Seldon Core. Instructions also online.

[ ]:
!kubectl create namespace seldon
[ ]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
[ ]:
import json

Serve SKLearn Iris Model

In order to deploy SKLearn artifacts, we can leverage the pre-packaged SKLearn inference server. The exposed API can follow either:

For details on each of these protocols, you can check the documentation section on API protocols.

Default Seldon protocol

To deploy and start serving an SKLearn artifact using Seldon’s default protocol, we can use a config like the one below:

[ ]:
%%writefile ../servers/sklearnserver/samples/iris.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  predictors:
  - graph:
      name: classifier
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v1.12.0-dev/sklearn/iris
    name: default
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG

We can then apply it to deploy it to our Kubernetes cluster.

[ ]:
!kubectl apply -f ../servers/sklearnserver/samples/iris.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=sklearn -o jsonpath='{.items[0].metadata.name}')

Once it’s deployed we can send our sklearn model requests

REST Requests

[ ]:
X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
   -X POST http://localhost:8003/seldon/seldon/sklearn/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)
[ ]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="sklearn", namespace="seldon")
[ ]:
r = sc.predict(gateway="ambassador", transport="rest", shape=(1, 4))
print(r)
assert r.success == True

gRPC Requests

[ ]:
r = sc.predict(gateway="ambassador", transport="grpc", shape=(1, 4))
print(r)
assert r.success == True
[ ]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
         -rpc-header seldon:sklearn -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8003 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

And delete the model we deployed

[ ]:
!kubectl delete -f ../servers/sklearnserver/samples/iris.yaml

KFServing V2 protocol

We can deploy a SKLearn artifact, exposing an API compatible with KFServing’s V2 Protocol by specifying the protocol of our SeldonDeployment as kfserving. For example, we can consider the config below:

[ ]:
%%writefile ./resources/iris-sklearn-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris
  protocol: kfserving
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris-0.23.2/lr_model
      name: classifier
    name: default
    replicas: 1

We can then apply it to deploy our model to our Kubernetes cluster.

[ ]:
!kubectl apply -f resources/iris-sklearn-v2.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=sklearn -o jsonpath='{.items[0].metadata.name}')

Once it’s deployed, we can send inference requests to our model. Note that, since it’s using the KFServing’s V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

[ ]:
import json

import requests

inference_request = {
    "inputs": [
        {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
    ]
}

endpoint = "http://localhost:8003/seldon/seldon/sklearn/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

Finally, we can delete the model we deployed.

[ ]:
!kubectl delete -f resources/iris-sklearn-v2.yaml

Serve XGBoost Iris Model

In order to deploy XGBoost models, we can leverage the pre-packaged XGBoost inference server. The exposed API can follow either:

For details on each of these protocols, you can check the documentation section on API protocols.

Default Seldon protocol

We can deploy a XGBoost model uploaded to an object store by using the XGBoost model server implementation as shown in the config below:

[ ]:
%%writefile resources/iris.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: classifier
    name: default
    replicas: 1

And then we apply it to deploy it to our kubernetes cluster

[ ]:
!kubectl apply -f resources/iris.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=xgboost -o jsonpath='{.items[0].metadata.name}')

Rest Requests

[ ]:
X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
   -X POST http://localhost:8003/seldon/seldon/xgboost/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)
[ ]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="xgboost", namespace="seldon")
[ ]:
r = sc.predict(gateway="ambassador", transport="rest", shape=(1, 4))
print(r)
assert r.success == True

gRPC Requests

[ ]:
r = sc.predict(gateway="ambassador", transport="grpc", shape=(1, 4))
print(r)
assert r.success == True
[ ]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
         -rpc-header seldon:xgboost -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8003 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

And delete the model we deployed

[ ]:
!kubectl delete -f resources/iris.yaml

KFServing V2 protocol

We can deploy a XGBoost model, exposing an API compatible with KFServing’s V2 Protocol by specifying the protocol of our SeldonDeployment as kfserving. For example, we can consider the config below:

[ ]:
%%writefile ./resources/iris-xgboost-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  protocol: kfserving
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: classifier
    name: default
    replicas: 1

We can then apply it to deploy our model to our Kubernetes cluster.

[ ]:
!kubectl apply -f ./resources/iris-xgboost-v2.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=xgboost -o jsonpath='{.items[0].metadata.name}')

Once it’s deployed, we can send inference requests to our model. Note that, since it’s using the KFServing’s V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

[ ]:
import json

import requests

inference_request = {
    "inputs": [
        {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
    ]
}

endpoint = "http://localhost:8003/seldon/seldon/xgboost/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

Finally, we can delete the model we deployed.

[ ]:
!kubectl delete -f ./resources/iris-xgboost-v2.yaml

Serve Tensorflow MNIST Model

We can deploy a tensorflow model uploaded to an object store by using the tensorflow model server implementation as the config below.

This notebook contains two examples, one which shows how you can use the TFServing prepackaged serve with the Seldon Protocol, and a second one which shows how you can deploy it using the tensorlfow protocol (so you can send requests of the exact format as you would to a tfserving server).

Serve Tensorflow MNIST Model with Seldon Protocol

The config file below shows how you can deploy your Tensorflow model which exposes the Seldon protocol.

[ ]:
%%writefile ./resources/mnist_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/mnist-model
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
        - name: model_input
          type: STRING
          value: images
        - name: model_output
          type: STRING
          value: scores
    name: default
    replicas: 1
[ ]:
!kubectl apply -f ./resources/mnist_rest.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=tfserving -o jsonpath='{.items[0].metadata.name}')
[ ]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="tfserving", namespace="seldon")

REST Request

[ ]:
r = sc.predict(gateway="ambassador", transport="rest", shape=(1, 784))
print(r)
assert r.success == True

gRPC Request

[ ]:
r = sc.predict(gateway="ambassador", transport="grpc", shape=(1, 784))
print(r)
assert r.success == True

And delete the model we deployed

[ ]:
!kubectl delete -f ./resources/mnist_rest.yaml

Serve Tensorflow Model with Tensorflow protocol

The config file below shows how you can deploy your Tensorflow model which exposes the Tensorflow protocol.

[ ]:
%%writefile ./resources/halfplustwo_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: hpt
spec:
  name: hpt
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/half_plus_two
      name:  halfplustwo
      parameters:
        - name: model_name
          type: STRING
          value: halfplustwo
    name: default
    replicas: 1
[ ]:
!kubectl apply -f ./resources/halfplustwo_rest.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=hpt -o jsonpath='{.items[0].metadata.name}')
[ ]:
import json
X=!curl -s -d '{"instances": [1.0, 2.0, 5.0]}' \
   -X POST http://localhost:8003/seldon/seldon/hpt/v1/models/halfplustwo/:predict \
   -H "Content-Type: application/json"
d=json.loads("".join(X))
print(d)
assert(d["predictions"][0] == 2.5)
[ ]:
X=!cd ../executor/proto && grpcurl \
   -d '{"model_spec":{"name":"halfplustwo"},"inputs":{"x":{"dtype": 1, "tensor_shape": {"dim":[{"size": 3}]}, "floatVal" : [1.0, 2.0, 3.0]}}}' \
   -rpc-header seldon:hpt -rpc-header namespace:seldon \
   -plaintext -proto ./prediction_service.proto \
   0.0.0.0:8003 tensorflow.serving.PredictionService/Predict
d=json.loads("".join(X))
print(d)
assert(d["outputs"]["x"]["floatVal"][0] == 2.5)
[ ]:
!kubectl delete -f ./resources/halfplustwo_rest.yaml

Serve MLFlow Elasticnet Wines Model

We can deploy an MLFlow model uploaded to an object store by using the MLFlow model server implementation as the config below:

[ ]:
%%writefile ./resources/elasticnet_wine.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  name: wines
  predictors:
  - componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: classifier
          livenessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
    graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://seldon-models/v1.10.0-dev/mlflow/elasticnet_wine
      name: classifier
    name: default
    replicas: 1
[ ]:
!kubectl apply -f ./resources/elasticnet_wine.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=mlflow -o jsonpath='{.items[0].metadata.name}')

REST requests

[ ]:
X=!curl -s -d '{"data": {"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
   -X POST http://localhost:8003/seldon/seldon/mlflow/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)
[ ]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="mlflow", namespace="seldon")
[ ]:
r = sc.predict(gateway="ambassador", transport="rest", shape=(1, 11))
print(r)
assert r.success == True

gRPC Requests

[ ]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
         -rpc-header seldon:mlflow -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8003 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)
[ ]:
r = sc.predict(gateway="ambassador", transport="grpc", shape=(1, 11))
print(r)
assert r.success == True
[ ]:
!kubectl delete -f ./resources/elasticnet_wine.yaml

MLFlow kfserving v2 protocol

[ ]:
%%writefile ./resources/elasticnet_wine_v2.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  protocol: kfserving  # Activate v2 protocol
  name: wines
  predictors:
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/v1.10.0-dev/mlflow/elasticnet_wine
        name: classifier
      name: default
      replicas: 1
[ ]:
!kubectl apply -f ./resources/elasticnet_wine_v2.yaml
[ ]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=mlflow -o jsonpath='{.items[0].metadata.name}')

REST requests

[ ]:
import json

import requests

inference_request = {
    "parameters": {"content_type": "pd"},
    "inputs": [
        {
            "name": "fixed acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [7.4],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "volatile acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.7000],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "citric acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [0],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "residual sugar",
            "shape": [1],
            "datatype": "FP32",
            "data": [1.9],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "chlorides",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.076],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "free sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [11],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "total sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [34],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "density",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.9978],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "pH",
            "shape": [1],
            "datatype": "FP32",
            "data": [3.51],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "sulphates",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.56],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "alcohol",
            "shape": [1],
            "datatype": "FP32",
            "data": [9.4],
            "parameters": {"content_type": "np"},
        },
    ],
}

endpoint = "http://localhost:8003/seldon/seldon/mlflow/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok
[ ]:
!kubectl delete -f ./resources/elasticnet_wine_v2.yaml