This page was generated from testing/benchmarking/tensorflow/tensorflow.ipynb.

Tensorflow Load and Benchmark Tests

Using a pretrained model for Tensorflow flowers dataset

  • Load test the model at fixed rate

  • Benchmark the model to find maximum throughput and saturation handling

## Setup

  • Create a 3 node GCP cluster with n1-standard-8 node

  • Install Seldon Core

## TODO

  • gRPC

  • Run vegeta on separate node to model servers using affinity/taints

[1]:
!kubectl create namespace seldon
Error from server (AlreadyExists): namespaces "seldon" already exists
[2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon
Context "do-lon1-k8s-1-16-10-do-0-lon1-1594477430912" modified.
[3]:
import sys
sys.path.append('../')
from vegeta_utils import *

Put Taint on Nodes

[14]:
raw=!kubectl get nodes -o jsonpath='{.items[0].metadata.name}'
firstNode = raw[0]
raw=!kubectl get nodes -o jsonpath='{.items[1].metadata.name}'
secondNode = raw[0]
raw=!kubectl get nodes -o jsonpath='{.items[2].metadata.name}'
thirdNode = raw[0]
!kubectl taint nodes '{firstNode}' loadtester=active:NoSchedule
!kubectl taint nodes '{secondNode}' model=active:NoSchedule
!kubectl taint nodes '{thirdNode}' model=active:NoSchedule
node/pool-triv8uq93-3oaz0 tainted
error: Node pool-triv8uq93-3oaz1 already has model taint(s) with same effect(s) and --overwrite is false
error: Node pool-triv8uq93-3oazd already has model taint(s) with same effect(s) and --overwrite is false

Benchmark with Saturation Test

[5]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    name: default
    replicas: 1
Overwriting tf_flowers.yaml
[6]:
run_model("tf_flowers.yaml")
Available with 1 pods

Run test to gather the max throughput of the model

[7]:
results = run_vegeta_test("tf_vegeta_cfg.yaml","vegeta_max.yaml","11m")
print(json.dumps(results, indent=4))
saturation_throughput=int(results["throughput"])
{
    "latencies": {
        "total": 18194676761380,
        "mean": 4069487085,
        "50th": 3865217401,
        "90th": 5285272466,
        "95th": 5768188708,
        "99th": 6667031940,
        "max": 7656080367,
        "min": 970003451
    },
    "bytes_in": {
        "total": 974678,
        "mean": 218
    },
    "bytes_out": {
        "total": 72318425,
        "mean": 16175
    },
    "earliest": "2020-07-13T09:38:48.517793327Z",
    "latest": "2020-07-13T09:41:48.535299333Z",
    "end": "2020-07-13T09:41:52.165570518Z",
    "duration": 180017506006,
    "wait": 3630271185,
    "requests": 4471,
    "rate": 24.836473403042152,
    "throughput": 24.34551655558568,
    "success": 1,
    "status_codes": {
        "200": 4471
    },
    "errors": []
}
[8]:
print("Max Throughtput=",saturation_throughput)
Max Throughtput= 24

Load Tests with HPA

Run with an HPA at saturation rate to check: * Latencies affected by scaling

[9]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - hpaSpec:
        minReplicas: 1
        maxReplicas: 5
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 70
          type: Resource
      spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '1'
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    name: default
    replicas: 1
Overwriting tf_flowers.yaml
[10]:
run_model("tf_flowers.yaml")
Available with 1 pods
[11]:
rate=saturation_throughput
duration="10m"
%env DURATION=$duration
%env RATE=$rate/1s
!cat vegeta_cfg.tmpl.yaml | envsubst > vegeta.tmp.yaml
!cat vegeta.tmp.yaml
env: DURATION=10m
env: RATE=24/1s
apiVersion: batch/v1
kind: Job
metadata:
  name: tf-load-test
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
        - args:
            - vegeta -cpus=4 attack -keepalive=false -duration=10m -rate=24/1s -targets=/var/vegeta/cfg
              | vegeta report -type=json
          command:
            - sh
            - -c
          image: peterevans/vegeta:latest
          imagePullPolicy: Always
          name: vegeta
          volumeMounts:
            - mountPath: /var/vegeta
              name: tf-vegeta-cfg
      restartPolicy: Never
      volumes:
        - configMap:
            defaultMode: 420
            name: tf-vegeta-cfg
          name: tf-vegeta-cfg
      tolerations:
      - key: loadtester
        operator: Exists
        effect: NoSchedule
[15]:
results = run_vegeta_test("tf_vegeta_cfg.yaml","vegeta.tmp.yaml","11m")
print(json.dumps(results, indent=4))
{
    "latencies": {
        "total": 3743859444532,
        "mean": 259990239,
        "50th": 131917169,
        "90th": 310053255,
        "95th": 916684759,
        "99th": 2775052710,
        "max": 7645706522,
        "min": 61953433
    },
    "bytes_in": {
        "total": 3139200,
        "mean": 218
    },
    "bytes_out": {
        "total": 232920000,
        "mean": 16175
    },
    "earliest": "2020-07-13T09:57:01.982849851Z",
    "latest": "2020-07-13T10:07:01.94120089Z",
    "end": "2020-07-13T10:07:02.043547541Z",
    "duration": 599958351039,
    "wait": 102346651,
    "requests": 14400,
    "rate": 24.001666074090423,
    "throughput": 23.997572337989126,
    "success": 1,
    "status_codes": {
        "200": 14400
    },
    "errors": []
}
[17]:
print_vegeta_results(results)
Latencies:
        mean: 259.990239 ms
        50th: 131.917169 ms
        90th: 310.053255 ms
        95th: 916.684759 ms
        99th: 2775.05271 ms

Throughput: 23.997572337989126/s
Errors: False
[ ]: