This page was generated from testing/benchmarking/python/python_serialization.ipynb.
Python Wrapper BenchmarkingΒΆ
PrequisitesΒΆ
An authenticated K8S cluster with istio and Seldon Core installed
You can use the ansible seldon-core playbook at https://github.com/SeldonIO/ansible-k8s-collection
vegeta and ghz benchmarking tools
Port forward to istio
kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080
Tests
Large Batch Size
predict
method with:REST
ndarray
tensor
tftensor
gRPC
ndarray
tensor
tftensor
predict_raw
method with:REST
ndarray
tensor
tftensor
gRPC
ndarray
tensor
tftensor
Small Batch Size
predict
method with:REST
ndarray
tensor
tftensor
gRPC
ndarray
tensor
tftensor
TLDRΒΆ
gRPC is faster than REST
tftensor is best for large batch size
ndarray with gRPC is bad for large batch size
simpler tensor/ndarray is better for small batch size
[1]:
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def writetemplate(line, cell):
with open(line, "w") as f:
f.write(cell.format(**globals()))
[2]:
VERSION = !cat ../../../version.txt
VERSION = VERSION[0]
VERSION
[2]:
'1.10.0-dev'
[3]:
!kubectl create namespace seldon
Error from server (AlreadyExists): namespaces "seldon" already exists
[4]:
!helm upgrade --install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --version 1.9.0 --namespace seldon-system --set istio.enabled="true" --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local"
Release "seldon-core" has been upgraded. Happy Helming!
NAME: seldon-core
LAST DEPLOYED: Thu Jul 1 14:03:55 2021
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 2
TEST SUITE: None
Test with Predict method on Large Batch SizeΒΆ
The seldontest_predict
has simply a predict
method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.
[5]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
namespace: seldon
spec:
predictors:
- annotations:
seldon.io/no-engine: "true"
componentSpecs:
- spec:
containers:
- image: seldonio/seldontest_predict:{VERSION}
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: 1
limits:
cpu: 1
env:
- name: GUNICORN_WORKERS
value: "1"
- name: GUNICORN_THREADS
value: "1"
tolerations:
- key: model
operator: Exists
effect: NoSchedule
graph:
children: []
name: classifier
type: MODEL
name: default
replicas: 1
[6]:
!kubectl apply -f model.yaml
seldondeployment.machinelearning.seldon.io/seldon-model created
[7]:
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon
pod/seldon-model-default-0-classifier-5445bd4ccf-c2vdr condition met
Create payloads and associated vegeta configurations for
ndarray
tensor
tftensor
We will create an array of 100,000 consecutive integers.
[8]:
import json
sz = 100000
vals = list(range(sz))
valStr = f"{vals}"
payload = '{"data": {"ndarray": [' + valStr + "]}}"
with open("data_ndarray.json", "w") as f:
f.write(payload)
payload_tensor = (
'{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}"
)
with open("data_tensor.json", "w") as f:
f.write(payload_tensor)
[9]:
import numpy as np
import tensorflow as tf
from google.protobuf import json_format
array = np.array(vals)
tftensor = tf.make_tensor_proto(array)
jStrTensor = json_format.MessageToJson(tftensor)
jTensor = json.loads(jStrTensor)
payload_tftensor = (
'{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}"
)
with open("data_tftensor.json", "w") as f:
f.write(payload_tftensor)
[10]:
import base64
import json
sample_string_bytes = payload_tensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tensor.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
sample_string_bytes = payload.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_ndarray.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
sample_string_bytes = payload_tftensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tftensor.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
Smoke test port-forward to check everything is working
[14]:
!curl -X POST -H 'Content-Type: application/json' \
-d '@./data_ndarray.json' \
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions
{"data":{"names":[],"ndarray":[1]},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}
[15]:
!curl -X POST -H 'Content-Type: application/json' \
-d '@./data_tensor.json' \
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions
{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}
[16]:
!curl -X POST -H 'Content-Type: application/json' \
-d '@./data_tftensor.json' \
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions
{"data":{"names":[],"tftensor":{"dtype":"DT_INT64","int64Val":["1"],"tensorShape":{"dim":[{"size":"1"}]}}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
[17]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json |
vegeta report -type=text
Requests [total, rate, throughput] 518, 51.76, 51.66
Duration [total, attack, wait] 10.027s, 10.008s, 19.333ms
Latencies [min, mean, 50, 90, 95, 99, max] 17.337ms, 19.355ms, 19.136ms, 20.336ms, 21.214ms, 24.886ms, 27.831ms
Bytes In [total, mean] 59570, 115.00
Bytes Out [total, mean] 356857970, 688915.00
Success [ratio] 100.00%
Status Codes [code:count] 200:518
Error Set:
[18]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 504, 50.35, 50.25
Duration [total, attack, wait] 10.03s, 10.01s, 19.353ms
Latencies [min, mean, 50, 90, 95, 99, max] 17.885ms, 19.897ms, 19.616ms, 21.1ms, 22.205ms, 25.498ms, 34.99ms
Bytes In [total, mean] 69048, 137.00
Bytes Out [total, mean] 347225760, 688940.00
Success [ratio] 100.00%
Status Codes [code:count] 200:504
Error Set:
[19]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 636, 63.55, 63.45
Duration [total, attack, wait] 10.023s, 10.008s, 14.782ms
Latencies [min, mean, 50, 90, 95, 99, max] 13.646ms, 15.756ms, 15.461ms, 17.41ms, 18.729ms, 20.628ms, 23.465ms
Bytes In [total, mean] 118932, 187.00
Bytes Out [total, mean] 678466356, 1066771.00
Success [ratio] 100.00%
Status Codes [code:count] 200:636
Error Set:
Example results
ndarray |
tensor |
tftensor |
---|---|---|
19.8ms |
19.7ms |
16.2ms |
Test gRPC
ndarray
tensor
tftensor
[20]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_ndarray.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 24
Total: 10.13 s
Slowest: 278.81 ms
Fastest: 242.25 ms
Average: 244.06 ms
Requests/sec: 2.37
Response time histogram:
242.253 [1] |βββββββ
245.909 [2] |βββββββββββββ
249.564 [4] |βββββββββββββββββββββββββββ
253.219 [6] |ββββββββββββββββββββββββββββββββββββββββ
256.874 [4] |βββββββββββββββββββββββββββ
260.530 [1] |βββββββ
264.185 [1] |βββββββ
267.840 [2] |βββββββββββββ
271.496 [0] |
275.151 [1] |βββββββ
278.806 [1] |βββββββ
Latency distribution:
10 % in 247.44 ms
25 % in 249.47 ms
50 % in 252.85 ms
75 % in 260.70 ms
90 % in 272.55 ms
95 % in 278.81 ms
0 % in 0 ns
Status code distribution:
[OK] 23 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
[21]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 92
Total: 10.10 s
Slowest: 21.23 ms
Fastest: 4.91 ms
Average: 7.58 ms
Requests/sec: 9.11
Response time histogram:
4.906 [1] |β
6.539 [55] |ββββββββββββββββββββββββββββββββββββββββ
8.171 [17] |ββββββββββββ
9.804 [4] |βββ
11.436 [0] |
13.069 [4] |βββ
14.701 [3] |ββ
16.334 [3] |ββ
17.966 [0] |
19.599 [2] |β
21.232 [2] |β
Latency distribution:
10 % in 5.51 ms
25 % in 5.70 ms
50 % in 6.14 ms
75 % in 7.09 ms
90 % in 14.14 ms
95 % in 18.77 ms
0 % in 0 ns
Status code distribution:
[OK] 91 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
[22]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tftensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 425
Total: 10.04 s
Slowest: 16.38 ms
Fastest: 3.97 ms
Average: 5.33 ms
Requests/sec: 42.31
Response time histogram:
3.970 [1] |
5.211 [281] |ββββββββββββββββββββββββββββββββββββββββ
6.452 [91] |βββββββββββββ
7.692 [25] |ββββ
8.933 [8] |β
10.174 [6] |β
11.415 [7] |β
12.656 [2] |
13.896 [1] |
15.137 [1] |
16.378 [1] |
Latency distribution:
10 % in 4.34 ms
25 % in 4.54 ms
50 % in 4.89 ms
75 % in 5.52 ms
90 % in 6.79 ms
95 % in 8.30 ms
99 % in 11.71 ms
Status code distribution:
[OK] 424 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
Example results
ndarray |
tensor |
tftensor |
---|---|---|
253ms |
8.4ms |
5.5ms |
ConclusionsΒΆ
gRPC is generally faster than REST except for ndarray which is much worse and should not be used with gRPC
tftensor is fastest
[152]:
!kubectl delete -f model.yaml
seldondeployment.machinelearning.seldon.io "seldon-model" deleted
Test Predct RawΒΆ
[158]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
namespace: seldon
spec:
predictors:
- annotations:
seldon.io/no-engine: "true"
componentSpecs:
- spec:
containers:
- image: seldonio/seldontest_predict_raw:{VERSION}
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: 1
limits:
cpu: 1
env:
- name: GUNICORN_WORKERS
value: "1"
- name: GUNICORN_THREADS
value: "1"
tolerations:
- key: model
operator: Exists
effect: NoSchedule
graph:
children: []
name: classifier
type: MODEL
name: default
replicas: 1
[159]:
!kubectl apply -f model.yaml
seldondeployment.machinelearning.seldon.io/seldon-model created
[160]:
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon
pod/seldon-model-default-0-classifier-5dc8fbd597-kk7td condition met
Smoke test port-forward to check everything is working
[161]:
!curl -X POST -H 'Content-Type: application/json' \
-d '@./data_tftensor.json' \
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions
[1]
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
[162]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json |
vegeta report -type=text
Requests [total, rate, throughput] 724, 72.35, 72.25
Duration [total, attack, wait] 10.021s, 10.007s, 14.458ms
Latencies [min, mean, 50, 90, 95, 99, max] 12.228ms, 13.838ms, 13.683ms, 14.641ms, 15.489ms, 17.888ms, 22.263ms
Bytes In [total, mean] 2896, 4.00
Bytes Out [total, mean] 498774460, 688915.00
Success [ratio] 100.00%
Status Codes [code:count] 200:724
Error Set:
[163]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 724, 72.32, 72.22
Duration [total, attack, wait] 10.025s, 10.011s, 14.307ms
Latencies [min, mean, 50, 90, 95, 99, max] 12.362ms, 13.844ms, 13.701ms, 14.655ms, 15.493ms, 17.976ms, 18.802ms
Bytes In [total, mean] 2896, 4.00
Bytes Out [total, mean] 498792560, 688940.00
Success [ratio] 100.00%
Status Codes [code:count] 200:724
Error Set:
[164]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 901, 90.04, 89.93
Duration [total, attack, wait] 10.018s, 10.007s, 11.64ms
Latencies [min, mean, 50, 90, 95, 99, max] 8.955ms, 11.116ms, 10.994ms, 12.099ms, 12.721ms, 15.208ms, 19.918ms
Bytes In [total, mean] 3604, 4.00
Bytes Out [total, mean] 961160671, 1066771.00
Success [ratio] 100.00%
Status Codes [code:count] 200:901
Error Set:
Example results
ndarray |
tensor |
tftensor |
---|---|---|
13.3ms |
13.3ms |
11.1ms |
Test gRPC
ndarray
tensor
tftensor
[165]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_ndarray.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 44
Total: 10.04 s
Slowest: 69.07 ms
Fastest: 44.44 ms
Average: 46.03 ms
Requests/sec: 4.38
Response time histogram:
44.440 [1] |β
46.904 [31] |ββββββββββββββββββββββββββββββββββββββββ
49.367 [6] |ββββββββ
51.831 [2] |βββ
54.294 [2] |βββ
56.758 [0] |
59.221 [0] |
61.684 [0] |
64.148 [0] |
66.611 [0] |
69.075 [1] |β
Latency distribution:
10 % in 45.05 ms
25 % in 45.40 ms
50 % in 46.30 ms
75 % in 47.34 ms
90 % in 50.16 ms
95 % in 53.38 ms
0 % in 0 ns
Status code distribution:
[OK] 43 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
[166]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 92
Total: 10.10 s
Slowest: 19.81 ms
Fastest: 4.93 ms
Average: 7.91 ms
Requests/sec: 9.11
Response time histogram:
4.932 [1] |β
6.419 [53] |ββββββββββββββββββββββββββββββββββββββββ
7.907 [12] |βββββββββ
9.395 [5] |ββββ
10.882 [4] |βββ
12.370 [1] |β
13.858 [3] |ββ
15.346 [3] |ββ
16.833 [2] |ββ
18.321 [3] |ββ
19.809 [4] |βββ
Latency distribution:
10 % in 5.21 ms
25 % in 5.68 ms
50 % in 6.04 ms
75 % in 8.27 ms
90 % in 15.77 ms
95 % in 19.04 ms
0 % in 0 ns
Status code distribution:
[OK] 91 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
[167]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tftensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 426
Total: 10.03 s
Slowest: 11.74 ms
Fastest: 3.67 ms
Average: 5.02 ms
Requests/sec: 42.48
Response time histogram:
3.668 [1] |
4.475 [174] |ββββββββββββββββββββββββββββββββββββββββ
5.282 [141] |ββββββββββββββββββββββββββββββββ
6.089 [43] |ββββββββββ
6.897 [30] |βββββββ
7.704 [16] |ββββ
8.511 [6] |β
9.318 [8] |ββ
10.126 [2] |
10.933 [1] |
11.740 [3] |β
Latency distribution:
10 % in 4.08 ms
25 % in 4.27 ms
50 % in 4.61 ms
75 % in 5.30 ms
90 % in 6.62 ms
95 % in 7.66 ms
99 % in 10.26 ms
Status code distribution:
[OK] 425 responses
[Canceled] 1 responses
Error distribution:
[1] rpc error: code = Canceled desc = grpc: the client connection is closing
Example results
ndarray |
tensor |
tftensor |
---|---|---|
46ms |
7.9ms |
5.0ms |
ConclusionsΒΆ
predict_raw
is faster thanpredict
but you will need to handle the serialization/deserializtion yourself which maybe will make them equivalent unless specific techniques can be applied for your use case.
Test with Predict method on Small Batch SizeΒΆ
The seldontest_predict
has simply a predict
method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.
[15]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
namespace: seldon
spec:
predictors:
- annotations:
seldon.io/no-engine: "true"
componentSpecs:
- spec:
containers:
- image: seldonio/seldontest_predict:{VERSION}
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: 1
limits:
cpu: 1
env:
- name: GUNICORN_WORKERS
value: "1"
- name: GUNICORN_THREADS
value: "1"
tolerations:
- key: model
operator: Exists
effect: NoSchedule
graph:
children: []
name: classifier
type: MODEL
name: default
replicas: 1
[16]:
!kubectl apply -f model.yaml
seldondeployment.machinelearning.seldon.io/seldon-model configured
[22]:
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon
pod/seldon-model-default-0-classifier-5445bd4ccf-bgkcm condition met
Create payloads and associated vegeta configurations for
ndarray
tensor
tftensor
We will create an array of 100,000 consecutive integers.
[23]:
import json
sz = 1
vals = list(range(sz))
valStr = f"{vals}"
payload = '{"data": {"ndarray": [' + valStr + "]}}"
with open("data_ndarray.json", "w") as f:
f.write(payload)
payload_tensor = (
'{"data":{"tensor":{"shape":[1,' + str(sz) + '],"values":' + valStr + "}}}"
)
with open("data_tensor.json", "w") as f:
f.write(payload_tensor)
[24]:
import numpy as np
import tensorflow as tf
from google.protobuf import json_format
array = np.array(vals)
tftensor = tf.make_tensor_proto(array)
jStrTensor = json_format.MessageToJson(tftensor)
jTensor = json.loads(jStrTensor)
payload_tftensor = (
'{"data":{"tftensor":' + json.dumps(jTensor, separators=(",", ":")) + "}}"
)
with open("data_tftensor.json", "w") as f:
f.write(payload_tftensor)
[25]:
import base64
import json
sample_string_bytes = payload_tensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tensor.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
sample_string_bytes = payload.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_ndarray.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
sample_string_bytes = payload_tftensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {
"method": "POST",
"url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions",
"body": base64_string,
"header": {"Content-Type": ["application/json"]},
}
with open("vegeta_tftensor.json", "w") as f:
f.write(json.dumps(jqPayload, separators=(",", ":")))
f.write("\n")
Smoke test port-forward to check everything is working
[26]:
!curl -X POST -H 'Content-Type: application/json' \
-d '@./data_tensor.json' \
http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions
{"data":{"names":[],"tensor":{"shape":[1],"values":[1]}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}
Test REST
ndarray
tensor
tftensor
This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.
[27]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json |
vegeta report -type=text
Requests [total, rate, throughput] 5538, 553.80, 553.67
Duration [total, attack, wait] 10.002s, 10s, 2.364ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.569ms, 1.804ms, 1.739ms, 1.984ms, 2.198ms, 2.861ms, 6.62ms
Bytes In [total, mean] 636870, 115.00
Bytes Out [total, mean] 155064, 28.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5538
Error Set:
[28]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 5557, 555.65, 555.55
Duration [total, attack, wait] 10.003s, 10.001s, 1.753ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.578ms, 1.798ms, 1.74ms, 1.925ms, 2.119ms, 2.981ms, 5.968ms
Bytes In [total, mean] 761309, 137.00
Bytes Out [total, mean] 266736, 48.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5557
Error Set:
[29]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json |
vegeta report -type=text
Requests [total, rate, throughput] 4548, 454.75, 454.65
Duration [total, attack, wait] 10.003s, 10.001s, 2.141ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.937ms, 2.197ms, 2.138ms, 2.351ms, 2.482ms, 3.215ms, 9.424ms
Bytes In [total, mean] 850476, 187.00
Bytes Out [total, mean] 436608, 96.00
Success [ratio] 100.00%
Status Codes [code:count] 200:4548
Error Set:
Example results
ndarray |
tensor |
tftensor |
---|---|---|
1.8ms |
1.8ms |
2.1ms |
Test gRPC
ndarray
tensor
tftensor
[30]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_ndarray.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 6506
Total: 10.01 s
Slowest: 18.58 ms
Fastest: 1.26 ms
Average: 1.46 ms
Requests/sec: 650.23
Response time histogram:
1.260 [1] |
2.992 [6465] |ββββββββββββββββββββββββββββββββββββββββ
4.724 [30] |
6.456 [5] |
8.187 [2] |
9.919 [1] |
11.651 [0] |
13.382 [0] |
15.114 [0] |
16.846 [0] |
18.578 [1] |
Latency distribution:
10 % in 1.33 ms
25 % in 1.36 ms
50 % in 1.39 ms
75 % in 1.45 ms
90 % in 1.58 ms
95 % in 1.79 ms
99 % in 2.50 ms
Status code distribution:
[OK] 6505 responses
[Unavailable] 1 responses
Error distribution:
[1] rpc error: code = Unavailable desc = transport is closing
[31]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 6429
Total: 10.01 s
Slowest: 16.30 ms
Fastest: 1.29 ms
Average: 1.49 ms
Requests/sec: 642.56
Response time histogram:
1.287 [1] |
2.789 [6375] |ββββββββββββββββββββββββββββββββββββββββ
4.290 [36] |
5.792 [11] |
7.293 [2] |
8.795 [1] |
10.296 [0] |
11.798 [1] |
13.299 [0] |
14.801 [0] |
16.303 [1] |
Latency distribution:
10 % in 1.36 ms
25 % in 1.38 ms
50 % in 1.42 ms
75 % in 1.48 ms
90 % in 1.60 ms
95 % in 1.80 ms
99 % in 2.67 ms
Status code distribution:
[OK] 6428 responses
[Unavailable] 1 responses
Error distribution:
[1] rpc error: code = Unavailable desc = transport is closing
[32]:
%%bash
ghz \
--insecure \
--proto ../../../proto/prediction.proto \
--call seldon.protos.Seldon/Predict \
--data-file=./data_tftensor.json \
--qps=0 \
--cpus=1 \
--concurrency=1 \
--duration="10s" \
--format summary \
--metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
localhost:8003
Summary:
Count: 6066
Total: 10.01 s
Slowest: 9.38 ms
Fastest: 1.39 ms
Average: 1.57 ms
Requests/sec: 606.20
Response time histogram:
1.387 [1] |
2.187 [5945] |ββββββββββββββββββββββββββββββββββββββββ
2.986 [84] |β
3.785 [20] |
4.585 [7] |
5.384 [2] |
6.183 [4] |
6.983 [0] |
7.782 [0] |
8.582 [1] |
9.381 [1] |
Latency distribution:
10 % in 1.46 ms
25 % in 1.48 ms
50 % in 1.52 ms
75 % in 1.57 ms
90 % in 1.66 ms
95 % in 1.81 ms
99 % in 2.61 ms
Status code distribution:
[OK] 6065 responses
[Unavailable] 1 responses
Error distribution:
[1] rpc error: code = Unavailable desc = transport is closing
Example results
ndarray |
tensor |
tftensor |
---|---|---|
1.46ms |
1.49ms |
1.57ms |
ConclusionsΒΆ
gRPC is generally faster than REST
There is very little difference between payload types with simpler tensor/ndarray probably being slightly faster
[152]:
!kubectl delete -f model.yaml
seldondeployment.machinelearning.seldon.io "seldon-model" deleted
[ ]: