This page was generated from testing/benchmarking/tensorflow/tensorflow.ipynb.

Tensorflow Load and Benchmark Tests¶

Using a pretrained model for Tensorflow flowers dataset

Load test the model at fixed rate
Benchmark the model to find maximum throughput and saturation handling

## Setup

Create a 3 node GCP cluster with n1-standard-8 node
Install Seldon Core

## TODO

gRPC
Run vegeta on separate node to model servers using affinity/taints

[1]:

!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists

[2]:

!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "do-lon1-k8s-1-16-10-do-0-lon1-1594477430912" modified.

[3]:

import sys

sys.path.append("../")
from vegeta_utils import *

Put Taint on Nodes¶

[14]:

raw = !kubectl get nodes -o jsonpath='{.items[0].metadata.name}'
firstNode = raw[0]
raw = !kubectl get nodes -o jsonpath='{.items[1].metadata.name}'
secondNode = raw[0]
raw = !kubectl get nodes -o jsonpath='{.items[2].metadata.name}'
thirdNode = raw[0]
!kubectl taint nodes '{firstNode}' loadtester=active:NoSchedule
!kubectl taint nodes '{secondNode}' model=active:NoSchedule
!kubectl taint nodes '{thirdNode}' model=active:NoSchedule

node/pool-triv8uq93-3oaz0 tainted
error: Node pool-triv8uq93-3oaz1 already has model taint(s) with same effect(s) and --overwrite is false
error: Node pool-triv8uq93-3oazd already has model taint(s) with same effect(s) and --overwrite is false

Benchmark with Saturation Test¶

[5]:

%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    name: default
    replicas: 1

Overwriting tf_flowers.yaml

[6]:

run_model("tf_flowers.yaml")

Available with 1 pods

Run test to gather the max throughput of the model

[7]:

results = run_vegeta_test("tf_vegeta_cfg.yaml", "vegeta_max.yaml", "11m")
print(json.dumps(results, indent=4))
saturation_throughput = int(results["throughput"])

{
    "latencies": {
        "total": 18194676761380,
        "mean": 4069487085,
        "50th": 3865217401,
        "90th": 5285272466,
        "95th": 5768188708,
        "99th": 6667031940,
        "max": 7656080367,
        "min": 970003451
    },
    "bytes_in": {
        "total": 974678,
        "mean": 218
    },
    "bytes_out": {
        "total": 72318425,
        "mean": 16175
    },
    "earliest": "2020-07-13T09:38:48.517793327Z",
    "latest": "2020-07-13T09:41:48.535299333Z",
    "end": "2020-07-13T09:41:52.165570518Z",
    "duration": 180017506006,
    "wait": 3630271185,
    "requests": 4471,
    "rate": 24.836473403042152,
    "throughput": 24.34551655558568,
    "success": 1,
    "status_codes": {
        "200": 4471
    },
    "errors": []
}

[8]:

print("Max Throughput=", saturation_throughput)

Max Throughput= 24

Load Tests with HPA¶

Run with an HPA at saturation rate to check: * Latencies affected by scaling

[9]:

%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - hpaSpec:
        minReplicas: 1
        maxReplicas: 5
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 70
          type: Resource
      spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '1'
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    name: default
    replicas: 1

Overwriting tf_flowers.yaml

[10]:

run_model("tf_flowers.yaml")

Available with 1 pods

[11]:

rate = saturation_throughput
duration = "10m"
%env DURATION=$duration
%env RATE=$rate/1s
!cat vegeta_cfg.tmpl.yaml | envsubst > vegeta.tmp.yaml
!cat vegeta.tmp.yaml

env: DURATION=10m
env: RATE=24/1s
apiVersion: batch/v1
kind: Job
metadata:
  name: tf-load-test
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
        - args:
            - vegeta -cpus=4 attack -keepalive=false -duration=10m -rate=24/1s -targets=/var/vegeta/cfg
              | vegeta report -type=json
          command:
            - sh
            - -c
          image: peterevans/vegeta:latest
          imagePullPolicy: Always
          name: vegeta
          volumeMounts:
            - mountPath: /var/vegeta
              name: tf-vegeta-cfg
      restartPolicy: Never
      volumes:
        - configMap:
            defaultMode: 420
            name: tf-vegeta-cfg
          name: tf-vegeta-cfg
      tolerations:
      - key: loadtester
        operator: Exists
        effect: NoSchedule

[15]:

results = run_vegeta_test("tf_vegeta_cfg.yaml", "vegeta.tmp.yaml", "11m")
print(json.dumps(results, indent=4))

{
    "latencies": {
        "total": 3743859444532,
        "mean": 259990239,
        "50th": 131917169,
        "90th": 310053255,
        "95th": 916684759,
        "99th": 2775052710,
        "max": 7645706522,
        "min": 61953433
    },
    "bytes_in": {
        "total": 3139200,
        "mean": 218
    },
    "bytes_out": {
        "total": 232920000,
        "mean": 16175
    },
    "earliest": "2020-07-13T09:57:01.982849851Z",
    "latest": "2020-07-13T10:07:01.94120089Z",
    "end": "2020-07-13T10:07:02.043547541Z",
    "duration": 599958351039,
    "wait": 102346651,
    "requests": 14400,
    "rate": 24.001666074090423,
    "throughput": 23.997572337989126,
    "success": 1,
    "status_codes": {
        "200": 14400
    },
    "errors": []
}

[17]:

print_vegeta_results(results)

Latencies:
        mean: 259.990239 ms
        50th: 131.917169 ms
        90th: 310.053255 ms
        95th: 916.684759 ms
        99th: 2775.05271 ms

Throughput: 23.997572337989126/s
Errors: False

[ ]: