Troubleshooting Guide

If your Seldon Deployment does not seem to be running here are some tips to diagnose the issue.

My model does not seem to be running

Check whether the Seldon Deployment is running:

kubectl get sdep

If it exists, check its status, for a Seldon deployment called <name>:

kubectl get sdep <name> -o jsonpath='{.status}'

This might look like:

>kubectl get sdep

NAME      AGE
mymodel   1m

>kubectl get sdep mymodel -o jsonpath='{.status}'
map[predictorStatus:[map[name:mymodel-mymodel-7cd068f replicas:1 replicasAvailable:1]] state:Available]

If you have the jq tool installed you can get a nicer output with:

>kubectl get sdep mymodel -o json | jq .status
{
  "predictorStatus": [
    {
      "name": "mymodel-mymodel-7cd068f",
      "replicas": 1,
      "replicasAvailable": 1
    }
  ],
  "state": "Available"
}

For a model with invalid json/yaml an example is shown below:

>kubectl get sdep seldon-model -o json | jq .status
{
  "description": "Cannot find field: imagename in message k8s.io.api.core.v1.Container",
  "state": "Failed"
}

Check all events on the SeldonDeployment

kubectl describe sdep mysdep

This will show each event from the operator including create, update, delete and error events.

My Seldon Deployment remains in “creating” state

Check if the pods are running successfully.

I get 500s when calling my model over the API

Check the logs of your running model pods.

My Seldon Deployment is not listed

Check the logs of the Seldon Operator. This is the pod which handles the Seldon Deployment graphs sent to Kubernetes. On a default installation, you can find the operator pod on the seldon-system namespace. The pod will be labelled as control-plane=seldon-controller-manager, so to get the logs you can run:

kubectl logs -n seldon-system -l control-plane=seldon-controller-manager

Invalid memory address

On some cases, you will see an error message on the operator logs like the following:

panic: runtime error: invalid memory address or nil pointer dereference

This error can be caused by empty or unexpected values in the SeldonDeployment spec. The main cause is usually a misconfiguration of the mutating webhook. To fix it, you can try to re-install Seldon Core in your cluster.

I have tried the above and I’m still confused