Protocols

Tensorflow protocol is only available in version >=1.1.

Seldon Core supports the following data planes:

REST and gRPC Seldon Protocol

Seldon is the default protocol for SeldonDeployment resources. You can specify the gRPC protocol by setting transport: grpc in your SeldonDeployment resource or ensuring all components in the graph have endpoint.tranport set ot grpc.

See example notebook.

REST and gRPC Tensorflow Protocol

Activate this protocol by speicfying protocol: tensorflow and transport: rest or transport: grpc in your Seldon Deployment. See example notebook.

For Seldon graphs the protocol will work as expected for single model graphs for Tensorflow Serving servers running as the single model in the graph. For more complex graphs you can chain models:

  • Sending the response from the first as a request to the second. This will be done automatically when you defined a chain of models as a Seldon graph. It is up to the user to ensure the response of each changed model can be fed a request to the next in the chain.

  • Only Predict calls can be handled in multiple model chaining.

General considerations:

  • Seldon components marked as MODELS, INPUT_TRANSFORMER and OUTPUT_TRANSFORMERS will allow a PredictionService Predict method to be called.

  • GetModelStatus for any model in the graph is available.

  • GetModelMetadata for any model in the graph is available.

  • Combining and Routing with the Tensorflow protocol is not presently supported.

  • status and metadata calls can be asked for any model in the graph

  • a non-standard Seldon extension is available to call predict on the graph as a whole: /v1/models/:predict.

  • The name of the model in the graph section of the SeldonDeployment spec must match the name of the model loaded onto the Tensorflow Server.

V2 KFServing Protocol

Seldon has collaborated with the NVIDIA Triton Server Project and the KFServing Project to create a new ML inference protocol. The core idea behind this joint effort is that this new protocol will become the standard inference protocol and will be used across multiple inference services.

In Seldon Core, this protocol can be used by specifing protocol: kfserving on your SeldonDeployment. For example,

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris-predict
  protocol: kfserving
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris
      name: classifier
      parameters:
        - name: method
          type: STRING
          value: predict
    name: default

At present, the kfserving protocol is only supported in a subset of pre-packaged inference servers. In particular,

Pre-packaged server

Supported

Underlying runtime

TRITON_SERVER

heavy_check_mark

NVIDIA Triton

SKLEARN_SERVER

heavy_check_mark

Seldon MLServer

XGBOOST_SERVER

heavy_check_mark

Seldon MLServer

You can try out the kfserving in this example notebook.