This page was generated from examples/dvc/sklearn_dvc_minio.ipynb.
Basic Examples for SKlearn Prepackaged Server train with DVC and deployed to MinIO¶
Prerequisites¶
A kubernetes cluster with kubectl configured
curl
pygmentize
Setup Seldon Core¶
Use the setup notebook to Setup Cluster to setup Seldon Core with an ingress.
Setup MinIO¶
Use the provided notebook to install Minio in your cluster and configure mc
CLI tool. Instructions also online.
Get DVC CLI tool¶
Using pip
pip install --user dvc
Or follow steps relevant to your platform from official documentation
Train model¶
Key points of training are defined in the following Makefile
[1]:
!pygmentize Makefile
env:
python3 -m venv .env
./.env/bin/pip install --upgrade pip setuptools
./.env/bin/pip install -r requirements.txt
train:
.env/bin/python train_iris.py
model: env train
which creates a python environment .env
and call following training script:
[2]:
!pygmentize train_iris.py
import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn import datasets
def main():
clf = LogisticRegression(solver="liblinear", multi_class='ovr')
p = Pipeline([("clf", clf)])
print("Training model...")
p.fit(X, y)
print("Model trained!")
filename_p = "model.joblib"
print("Saving model in %s" % filename_p)
joblib.dump(p, filename_p)
print("Model saved!")
if __name__ == "__main__":
print("Loading iris data set...")
iris = datasets.load_iris()
X, y = iris.data, iris.target
print("Dataset loaded!")
main()
Initial model training (first run)¶
First training with dvc will crate the model.dvc
file specifying the hash of output. We will use that hash to version our model.
[3]:
%%bash
dvc run -f model.dvc \
-d Makefile -d requirements.txt -d train_iris.py \
-o model.joblib \
--overwrite-dvcfile \
make model
Stage is cached, skipping.
[4]:
!cat model.dvc
md5: 99124ac10a601ab1d9f07f9c392b5d89
cmd: make model
deps:
- md5: 65fd61883993b68d1937bdc36c59b20c
path: Makefile
- md5: 7e8ce9f96492fee21db6a59c2b52f34d
path: requirements.txt
- md5: 49c19c3ea9deb642066c0a457181cfbf
path: train_iris.py
outs:
- md5: 8104914e6936da9864603b9bc4be2114
path: model.joblib
cache: true
metric: false
persist: false
The hash of the output is 8104914e6936da9864603b9bc4be2114
Reproducing results (successive later runs)¶
With DVC it is possible to repeat training in reproducible way as versions (hashes) of dependencies are all stored in the model.dvc
file
[5]:
%%bash
rm model.joblib -f
dvc repro model.dvc
Running command:
make model
Output 'model.joblib' didn't change. Skipping saving.
To track the changes with git, run:
git add model.dvc
WARNING: stage: 'model.dvc' changed.
Add trained model to remote S3 storage¶
Create metadata.yaml¶
In metadata we can use DVC’s hash to version deployed models
[6]:
%%writefile metadata.yaml
name: iris
versions: [iris/dvc:8104914e6936da9864603b9bc4be2114]
platform: sklearn
inputs:
- datatype: BYTES
name: input
shape: [ 1, 4 ]
outputs:
- datatype: BYTES
name: output
shape: [ 3 ]
Overwriting metadata.yaml
Create bucket for our trained model and push it¶
[7]:
%%bash
mc mb minio-seldon/dvc-iris -p
mc cp model.joblib minio-seldon/dvc-iris/
mc cp metadata.yaml minio-seldon/dvc-iris/
Bucket created successfully `minio-seldon/dvc-iris`.
`model.joblib` -> `minio-seldon/dvc-iris/model.joblib`
Total: 0 B, Transferred: 1.05 KiB, Speed: 101.06 KiB/s
`metadata.yaml` -> `minio-seldon/dvc-iris/metadata.yaml`
Total: 0 B, Transferred: 199 B, Speed: 40.66 KiB/s
[9]:
!mc ls minio-seldon/dvc-iris
[2020-05-24 18:56:43 BST] 199B metadata.yaml
[2020-05-24 18:56:43 BST] 1.1KiB model.joblib
Deploy sklearn server¶
[10]:
%%writefile secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: seldon-init-container-secret
type: Opaque
stringData:
RCLONE_CONFIG_S3_TYPE: s3
RCLONE_CONFIG_S3_PROVIDER: minio
RCLONE_CONFIG_S3_ENV_AUTH: "false"
RCLONE_CONFIG_S3_ACCESS_KEY_ID: minioadmin
RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: minioadmin
RCLONE_CONFIG_S3_ENDPOINT: http://minio.minio-system.svc.cluster.local:9000
Overwriting secret.yaml
[11]:
!kubectl apply -f secret.yaml
secret/seldon-init-container-secret configured
[12]:
%%writefile deploy.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: minio-dvc-sklearn
spec:
name: iris
predictors:
- componentSpecs:
graph:
children: []
implementation: SKLEARN_SERVER
modelUri: s3://dvc-iris
envSecretRefName: seldon-init-container-secret
name: classifier
name: default
replicas: 1
Overwriting deploy.yaml
[13]:
!kubectl apply -f deploy.yaml
seldondeployment.machinelearning.seldon.io/minio-dvc-sklearn created
[14]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=minio-dvc-sklearn -o jsonpath='{.items[0].metadata.name}')
Waiting for deployment "minio-dvc-sklearn-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "minio-dvc-sklearn-default-0-classifier" successfully rolled out
Test deployment¶
Test prediction¶
[15]:
%%bash
curl -s -X POST -H 'Content-Type: application/json' \
-d '{"data":{"ndarray":[[5.964, 4.006, 2.081, 1.031]]}}' \
http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/predictions | jq .
{
"data": {
"names": [
"t:0",
"t:1",
"t:2"
],
"ndarray": [
[
0.9548873249364185,
0.04505474761561256,
5.792744796895459e-05
]
]
},
"meta": {}
}
Test model metadata (optional)¶
[16]:
%%bash
curl -s http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/metadata/classifier | jq .
{
"inputs": [
{
"datatype": "BYTES",
"name": "input",
"shape": [
1,
4
]
}
],
"name": "iris",
"outputs": [
{
"datatype": "BYTES",
"name": "output",
"shape": [
3
]
}
],
"platform": "sklearn",
"versions": [
"iris/dvc:8104914e6936da9864603b9bc4be2114"
]
}
Cleanup¶
[17]:
!kubectl delete -f deploy.yaml
seldondeployment.machinelearning.seldon.io "minio-dvc-sklearn" deleted
[20]:
!rm .env -r
rm: cannot remove '.env': No such file or directory