Troubleshooting Guide¶
If your Seldon Deployment does not seem to be running here are some tips to diagnose the issue.
My model does not seem to be running¶
Check whether the Seldon Deployment is running:
kubectl get sdep
If it exists, check its status, for a Seldon deployment called <name>
:
kubectl get sdep <name> -o jsonpath='{.status}'
This might look like:
>kubectl get sdep
NAME AGE
mymodel 1m
>kubectl get sdep mymodel -o jsonpath='{.status}'
map[predictorStatus:[map[name:mymodel-mymodel-7cd068f replicas:1 replicasAvailable:1]] state:Available]
If you have the jq
tool installed you can get a nicer output with:
>kubectl get sdep mymodel -o json | jq .status
{
"predictorStatus": [
{
"name": "mymodel-mymodel-7cd068f",
"replicas": 1,
"replicasAvailable": 1
}
],
"state": "Available"
}
For a model with invalid json/yaml an example is shown below:
>kubectl get sdep seldon-model -o json | jq .status
{
"description": "Cannot find field: imagename in message k8s.io.api.core.v1.Container",
"state": "Failed"
}
Check all events on the SeldonDeployment¶
kubectl describe sdep mysdep
This will show each event from the operator including create, update, delete and error events.
My Seldon Deployment remains in “creating” state¶
Check if the pods are running successfully.
I get 500s when calling my model over the API¶
Check the logs of your running model pods.
My Seldon Deployment is not listed¶
Check the logs of the Seldon Operator.
This is the pod which handles the Seldon Deployment graphs sent to Kubernetes.
On a default installation, you can find the operator pod on the seldon-system
namespace.
The pod will be labelled as control-plane=seldon-controller-manager
, so to
get the logs you can run:
kubectl logs -n seldon-system -l control-plane=seldon-controller-manager
Invalid memory address¶
On some cases, you will see an error message on the operator logs like the following:
panic: runtime error: invalid memory address or nil pointer dereference
This error can be caused by empty or unexpected values in the
SeldonDeployment
spec.
The main cause is usually a misconfiguration of the mutating webhook.
To fix it, you can try to re-install Seldon Core in your
cluster.
I have tried the above and I’m still confused¶
Contact our Slack Community
Create an issue on Seldon Core’s Github repo. Please make sure to add any diagnostics from the above suggestions to help us diagnose your issue.