Graph Deployment Options

In Seldon core there is the capability to have different mode of scopes in containerizing models and Seldon core components in the inference graph. Each node of the inference graph will be a container in the Kubernetes cluster. Inference graph nodes containers could be encapsulated in a single or multiple kubernetes pods. The outer component of Seldon core are predictors which could contain one or more componentes that are referred by their name in constructing the inference graph in spec.componentSpecs.graph.

Mode One: Single pod deployment

The following is an example of a Seldon core inference graph with a single predictor.

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: linear-pipeline-single-pod
spec:
  name: linear-pipeline
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.0
          name: node-one
        - image: seldonio/mock_classifier:1.0
          name: node-two
        - image: seldonio/mock_classifier:1.0
          name: node-three
    graph:
      name: node-one
      type: MODEL
      children:
      - name: node-two
        type: MODEL
        children:
        - name: node-three
          type: MODEL
          children: []
    name: example

This will result in deploying all the graph nodes in a single pod:

kubectl get pods

NAME                                                       READY   STATUS    RESTARTS   AGE
seldon-c71cc2d950d44db1bc6afbeb0194c1da-5d8dddb8cb-xx4gv   5/5     Running   0          6m59s

Mode Two: Separate pod deployment

Another way of deployment is to implement the each node of inference graph in a seperate predictor which will result in having separate pods for each inference graph node.

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: linear-pipeline-separate-pods
spec:
  name: linear-pipeline
  annotations:
    seldon.io/engine-separate-pod: "true"
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.0
          name: node-one
          imagePullPolicy: Always
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.0
          name: node-two
          imagePullPolicy: Always
    - spec:
        containers:
          - image: seldonio/mock_classifier:1.0
            name: node-three
            imagePullPolicy: Always
    graph:
      name: node-one
      type: MODEL
      children:
      - name: node-two
        type: MODEL
        children:
        - name: node-three
          type: MODEL
          children: []
    name: example

This time it will result in having separate pods for each container.

kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
linear-pipeline-separate-pods-example-0-node-one-6954fbbd5m7pcp   1/1     Running   0          4m33s
linear-pipeline-separate-pods-example-1-node-two-c4f55f689gxkkr   1/1     Running   0          4m33s
linear-pipeline-separate-pods-example-2-node-three-99667dcmg9kg   1/1     Running   0          4m33s
linear-pipeline-separate-pods-example-svc-orch-656c6bdf59-6m6nc   1/1     Running   0          4m33s

Separate pods with prepackaged servers

If you want to deploy each inference graph node (model) in a separate pod but are using the prepackaged servers it is enough just to specify the name in the componentSpec like so:

spec:
  predictors:
    - name: default
      graph:
        name: model_one
        implementation: PREPACKAGED_SERVER
        modelUri: MODEL_URI
        children:
          - name: model_two
            implementation: PREPACKAGED_SERVER
            modelUri: MODEL_URI
      componentSpecs:
        - spec:
            containers:
              - name: model_one
        - spec:
            containers:
              - name: model_two

The most basic unit in Kubernetes are pods. This model will enable scaling at model level. In other words, you can scale each model separately while on the other hand having them in a single pod will change the granulity of scaling to the entire graph. However, on the other hand single pod deployment will need only a single sidecar istio container that needs less resource request from the sidecar containers. Another potential difference is the less communication overhead in the single pod mode as they will always be schduled on the same Kubernetes node.