This page was generated from examples/dvc/sklearn_dvc_minio.ipynb.

Basic Examples for SKlearn Prepackaged Server train with DVC and deployed to MinIO

Prerequisites

  • A kubernetes cluster with kubectl configured

  • curl

  • pygmentize

Setup Seldon Core

Use the setup notebook to Setup Cluster to setup Seldon Core with an ingress.

Setup MinIO

Use the provided notebook to install Minio in your cluster and configure mc CLI tool. Instructions also online.

Get DVC CLI tool

Using pip

pip install --user dvc

Or follow steps relevant to your platform from official documentation

Train model

Key points of training are defined in the following Makefile

[1]:
!pygmentize Makefile
env:
        python3 -m venv .env
        ./.env/bin/pip install --upgrade pip setuptools
        ./.env/bin/pip install -r requirements.txt

train:
        .env/bin/python train_iris.py

model: env train

which creates a python environment .env and call following training script:

[2]:
!pygmentize train_iris.py
import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn import datasets


def main():
    clf = LogisticRegression(solver="liblinear", multi_class='ovr')
    p = Pipeline([("clf", clf)])
    print("Training model...")
    p.fit(X, y)
    print("Model trained!")

    filename_p = "model.joblib"
    print("Saving model in %s" % filename_p)
    joblib.dump(p, filename_p)
    print("Model saved!")


if __name__ == "__main__":
    print("Loading iris data set...")
    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    print("Dataset loaded!")
    main()

Initial model training (first run)

First training with dvc will crate the model.dvc file specifying the hash of output. We will use that hash to version our model.

[3]:
%%bash
dvc run -f model.dvc \
          -d Makefile -d requirements.txt -d train_iris.py \
          -o model.joblib \
          --overwrite-dvcfile \
          make model
Stage is cached, skipping.
[4]:
!cat model.dvc
md5: 99124ac10a601ab1d9f07f9c392b5d89
cmd: make model
deps:
- md5: 65fd61883993b68d1937bdc36c59b20c
  path: Makefile
- md5: 7e8ce9f96492fee21db6a59c2b52f34d
  path: requirements.txt
- md5: 49c19c3ea9deb642066c0a457181cfbf
  path: train_iris.py
outs:
- md5: 8104914e6936da9864603b9bc4be2114
  path: model.joblib
  cache: true
  metric: false
  persist: false

The hash of the output is 8104914e6936da9864603b9bc4be2114

Reproducing results (successive later runs)

With DVC it is possible to repeat training in reproducible way as versions (hashes) of dependencies are all stored in the model.dvc file

[5]:
%%bash
rm model.joblib -f
dvc repro model.dvc
Running command:
        make model
Output 'model.joblib' didn't change. Skipping saving.

To track the changes with git, run:

        git add model.dvc
WARNING: stage: 'model.dvc' changed.

Add trained model to remote S3 storage

Create metadata.yaml

In metadata we can use DVC’s hash to version deployed models

[6]:
%%writefile metadata.yaml

name: iris
versions: [iris/dvc:8104914e6936da9864603b9bc4be2114]
platform: sklearn
inputs:
- datatype: BYTES
  name: input
  shape: [ 1, 4 ]
outputs:
- datatype: BYTES
  name: output
  shape: [ 3 ]
Overwriting metadata.yaml

Create bucket for our trained model and push it

[7]:
%%bash
mc mb minio-seldon/dvc-iris -p

mc cp model.joblib minio-seldon/dvc-iris/
mc cp metadata.yaml minio-seldon/dvc-iris/
Bucket created successfully `minio-seldon/dvc-iris`.
`model.joblib` -> `minio-seldon/dvc-iris/model.joblib`
Total: 0 B, Transferred: 1.05 KiB, Speed: 101.06 KiB/s
`metadata.yaml` -> `minio-seldon/dvc-iris/metadata.yaml`
Total: 0 B, Transferred: 199 B, Speed: 40.66 KiB/s
[9]:
!mc ls minio-seldon/dvc-iris
[2020-05-24 18:56:43 BST]    199B metadata.yaml
[2020-05-24 18:56:43 BST]  1.1KiB model.joblib

Deploy sklearn server

[10]:
%%writefile secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: seldon-init-container-secret
type: Opaque
stringData:
  RCLONE_CONFIG_S3_TYPE: s3
  RCLONE_CONFIG_S3_PROVIDER: minio
  RCLONE_CONFIG_S3_ENV_AUTH: "false"
  RCLONE_CONFIG_S3_ACCESS_KEY_ID: minioadmin
  RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: minioadmin
  RCLONE_CONFIG_S3_ENDPOINT: http://minio.minio-system.svc.cluster.local:9000

Overwriting secret.yaml
[11]:
!kubectl apply -f secret.yaml
secret/seldon-init-container-secret configured
[12]:
%%writefile deploy.yaml

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: minio-dvc-sklearn
spec:
  name: iris
  predictors:
  - componentSpecs:
    graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: s3://dvc-iris
      envSecretRefName: seldon-init-container-secret
      name: classifier
    name: default
    replicas: 1
Overwriting deploy.yaml
[13]:
!kubectl apply -f deploy.yaml
seldondeployment.machinelearning.seldon.io/minio-dvc-sklearn created
[14]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=minio-dvc-sklearn -o jsonpath='{.items[0].metadata.name}')
Waiting for deployment "minio-dvc-sklearn-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "minio-dvc-sklearn-default-0-classifier" successfully rolled out

Test deployment

Test prediction

[15]:
%%bash
curl -s -X POST -H 'Content-Type: application/json' \
    -d '{"data":{"ndarray":[[5.964, 4.006, 2.081, 1.031]]}}' \
    http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/predictions  | jq .
{
  "data": {
    "names": [
      "t:0",
      "t:1",
      "t:2"
    ],
    "ndarray": [
      [
        0.9548873249364185,
        0.04505474761561256,
        5.792744796895459e-05
      ]
    ]
  },
  "meta": {}
}

Test model metadata (optional)

[16]:
%%bash
curl -s http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/metadata/classifier | jq .
{
  "inputs": [
    {
      "datatype": "BYTES",
      "name": "input",
      "shape": [
        1,
        4
      ]
    }
  ],
  "name": "iris",
  "outputs": [
    {
      "datatype": "BYTES",
      "name": "output",
      "shape": [
        3
      ]
    }
  ],
  "platform": "sklearn",
  "versions": [
    "iris/dvc:8104914e6936da9864603b9bc4be2114"
  ]
}

Cleanup

[17]:
!kubectl delete -f deploy.yaml
seldondeployment.machinelearning.seldon.io "minio-dvc-sklearn" deleted
[20]:
!rm .env -r
rm: cannot remove '.env': No such file or directory