This page was generated from testing/benchmarking/svcOrch/svcOrch.ipynb.
Service Orchestrator Benchmark Tests¶
Using a pretrained model for Tensorflow flowers dataset
Tests the extra latency added by the svcOrch for a medium size image (224x224) classification model.
## Setup
Create a 3 node cluster
Install Seldon Core
[ ]:
!kubectl create namespace seldon
[ ]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
[1]:
import sys
sys.path.append("../")
from vegeta_utils import *
Put Taints Nodes¶
[6]:
raw = !kubectl get nodes -o jsonpath='{.items[0].metadata.name}'
firstNode = raw[0]
raw = !kubectl get nodes -o jsonpath='{.items[1].metadata.name}'
secondNode = raw[0]
raw = !kubectl get nodes -o jsonpath='{.items[2].metadata.name}'
thirdNode = raw[0]
!kubectl taint nodes '{firstNode}' loadtester=active:NoSchedule
!kubectl taint nodes '{secondNode}' model=active:NoSchedule
!kubectl taint nodes '{thirdNode}' model=active:NoSchedule
error: Node pool-triv8uq93-3oaz0 already has loadtester taint(s) with same effect(s) and --overwrite is false
error: Node pool-triv8uq93-3oaz1 already has model taint(s) with same effect(s) and --overwrite is false
error: Node pool-triv8uq93-3oazd already has model taint(s) with same effect(s) and --overwrite is false
Tensorflow Flowers Model - Latency Test¶
[7]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: tf-flowers
spec:
protocol: tensorflow
transport: rest
predictors:
- graph:
implementation: TENSORFLOW_SERVER
modelUri: gs://kfserving-samples/models/tensorflow/flowers
name: flowers
parameters:
- name: model_name
type: STRING
value: flowers
componentSpecs:
- spec:
containers:
- name: flowers
resources:
requests:
cpu: '2'
tolerations:
- key: model
operator: Exists
effect: NoSchedule
name: default
replicas: 1
Overwriting tf_flowers.yaml
[8]:
run_model("tf_flowers.yaml")
Available with 1 pods
[9]:
results = run_vegeta_test("tf_vegeta_cfg.yaml", "vegeta_1worker.yaml", "60m")
print(json.dumps(results, indent=4))
mean_with_executor = results["latencies"]["mean"]
{
"latencies": {
"total": 1200086051040,
"mean": 82639171,
"50th": 79832732,
"90th": 95849466,
"95th": 104009039,
"99th": 128516774,
"max": 964378237,
"min": 58091922
},
"bytes_in": {
"total": 3165796,
"mean": 218
},
"bytes_out": {
"total": 234893350,
"mean": 16175
},
"earliest": "2020-07-12T15:59:55.298435559Z",
"latest": "2020-07-12T16:19:55.34937906Z",
"end": "2020-07-12T16:19:55.42413935Z",
"duration": 1200050943501,
"wait": 74760290,
"requests": 14522,
"rate": 12.10115293741936,
"throughput": 12.100399111632546,
"success": 1,
"status_codes": {
"200": 14522
},
"errors": []
}
Tensorflow Flowers Model - No executor - Latency Test¶
[10]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: tf-flowers
spec:
protocol: tensorflow
transport: rest
predictors:
- graph:
implementation: TENSORFLOW_SERVER
modelUri: gs://kfserving-samples/models/tensorflow/flowers
name: flowers
parameters:
- name: model_name
type: STRING
value: flowers
annotations:
seldon.io/no-engine: "true"
componentSpecs:
- spec:
containers:
- name: flowers
resources:
requests:
cpu: '2'
tolerations:
- key: model
operator: Exists
effect: NoSchedule
name: default
replicas: 1
Overwriting tf_flowers.yaml
[11]:
run_model("tf_flowers.yaml")
Available with 1 pods
[12]:
results = run_vegeta_test("tf_standalone_vegeta_cfg.yaml", "vegeta_1worker.yaml", "60m")
print(json.dumps(results, indent=4))
mean_no_executor = results["latencies"]["mean"]
{
"latencies": {
"total": 1200089018347,
"mean": 73670289,
"50th": 73129037,
"90th": 81823849,
"95th": 84928884,
"99th": 93248220,
"max": 976431685,
"min": 53958421
},
"bytes_in": {
"total": 3551220,
"mean": 218
},
"bytes_out": {
"total": 263490750,
"mean": 16175
},
"earliest": "2020-07-12T16:21:00.12358772Z",
"latest": "2020-07-12T16:41:00.180620249Z",
"end": "2020-07-12T16:41:00.255483814Z",
"duration": 1200057032529,
"wait": 74863565,
"requests": 16290,
"rate": 13.574354850177793,
"throughput": 13.573508089417606,
"success": 1,
"status_codes": {
"200": 16290
},
"errors": []
}
[13]:
diff = (mean_with_executor - mean_no_executor) / 1e6
print("Diff in ms", diff)
Diff in ms 8.968882
GRPC Tensorflow Flowers Model - Latency Test¶
First create the binary proto for the flowers payload
[1]:
!python ../tf_proto_save.py --model flowers --input_path flowers.json --output_path flowers.bin
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/clive/anaconda3/envs/seldon-core/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
[14]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: tf-flowers
spec:
protocol: tensorflow
transport: grpc
predictors:
- graph:
implementation: TENSORFLOW_SERVER
modelUri: gs://kfserving-samples/models/tensorflow/flowers
name: flowers
parameters:
- name: model_name
type: STRING
value: flowers
componentSpecs:
- spec:
containers:
- name: flowers
resources:
requests:
cpu: '2'
tolerations:
- key: model
operator: Exists
effect: NoSchedule
name: default
replicas: 1
Overwriting tf_flowers.yaml
[15]:
run_model("tf_flowers.yaml")
Available with 1 pods
[16]:
results = run_ghz_test("flowers.bin", "ghz_1worker.yaml", "60m")
print(json.dumps(results, indent=4))
mean_with_executor = results["average"]
{
"date": "2020-07-12T17:12:04Z",
"endReason": "timeout",
"options": {
"host": "tf-flowers-default.seldon.svc.cluster.local:8000",
"proto": "/proto/prediction_service.proto",
"import-paths": [
"/proto",
"."
],
"call": "tensorflow.serving.PredictionService/Predict",
"insecure": true,
"total": 1000000,
"concurrency": 1,
"connections": 1,
"duration": 1800000000000,
"timeout": 20000000000,
"dial-timeout": 10000000000,
"keepalive": 1800000000000,
"binary": true,
"CPUs": 8
},
"count": 22978,
"total": 1800000675146,
"average": 78227435,
"fastest": 54712167,
"slowest": 938906233,
"rps": 12.76555076743859,
"errorDistribution": {
"rpc error: code = Unavailable desc = transport is closing": 1
},
"statusCodeDistribution": {
"OK": 22977,
"Unavailable": 1
},
"latencyDistribution": [
{
"percentage": 10,
"latency": 68291719
},
{
"percentage": 25,
"latency": 71762262
},
{
"percentage": 50,
"latency": 75875238
},
{
"percentage": 75,
"latency": 81163515
},
{
"percentage": 90,
"latency": 89225781
},
{
"percentage": 95,
"latency": 97536730
},
{
"percentage": 99,
"latency": 128647238
}
]
}
GRPC Tensorflow Flowers Model - No executor - Latency Test¶
[21]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: tf-flowers
spec:
protocol: tensorflow
transport: grpc
predictors:
- graph:
implementation: TENSORFLOW_SERVER
modelUri: gs://kfserving-samples/models/tensorflow/flowers
name: flowers
parameters:
- name: model_name
type: STRING
value: flowers
annotations:
seldon.io/no-engine: "true"
componentSpecs:
- spec:
containers:
- name: flowers
resources:
requests:
cpu: '2'
tolerations:
- key: model
operator: Exists
effect: NoSchedule
name: default
replicas: 1
Overwriting tf_flowers.yaml
[22]:
run_model("tf_flowers.yaml")
Available with 1 pods
[23]:
results = run_ghz_test("flowers.bin", "ghz_standalone_1worker.yaml", "60m")
print(json.dumps(results, indent=4))
mean_no_executor = results["average"]
{
"date": "2020-07-12T18:04:44Z",
"endReason": "timeout",
"options": {
"host": "tf-flowers-default.seldon.svc.cluster.local:9000",
"proto": "/proto/prediction_service.proto",
"import-paths": [
"/proto",
"."
],
"call": "tensorflow.serving.PredictionService/Predict",
"insecure": true,
"total": 1000000,
"concurrency": 1,
"connections": 1,
"duration": 1800000000000,
"timeout": 20000000000,
"dial-timeout": 10000000000,
"keepalive": 1800000000000,
"binary": true,
"CPUs": 8
},
"count": 24132,
"total": 1800013456837,
"average": 74479232,
"fastest": 53792435,
"slowest": 1008191507,
"rps": 13.406566438900391,
"errorDistribution": {
"rpc error: code = Unavailable desc = transport is closing": 1
},
"statusCodeDistribution": {
"OK": 24131,
"Unavailable": 1
},
"latencyDistribution": [
{
"percentage": 10,
"latency": 67087978
},
{
"percentage": 25,
"latency": 70242403
},
{
"percentage": 50,
"latency": 73838624
},
{
"percentage": 75,
"latency": 77894265
},
{
"percentage": 90,
"latency": 82282422
},
{
"percentage": 95,
"latency": 85533746
},
{
"percentage": 99,
"latency": 93875540
}
]
}
[24]:
diff = (mean_with_executor - mean_no_executor) / 1e6
print("Diff in ms", diff)
Diff in ms 3.748203
[ ]: