This page was generated from examples/batch/hdfs-argo-workflows/hdfs-batch.ipynb.
Batch processing with Argo Worfklows and HDFS¶
In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.
Dependencies:
Seldon core installed as per the docs with an ingress
HDFS namenode/datanode accessible from your cluster (here in-cluster installation for demo)
Argo Workfklows installed in cluster (and argo CLI for commands)
Python
hdfscli
for interacting with the installedhdfs
instance
Setup¶
Install Seldon Core¶
Use the notebook to set-up Seldon Core with Ambassador or Istio Ingress.
Note: If running with KIND you need to make sure do follow these steps as workaround to the /.../docker.sock
known issue:
kubectl patch -n argo configmap workflow-controller-configmap \
--type merge -p '{"data": {"config": "containerRuntimeExecutor: k8sapi"}}'
Install HDFS¶
For this example we will need a running hdfs
storage. We can use these helm charts from Gradiant.
helm repo add gradiant https://gradiant.github.io/charts/
kubectl create namespace hdfs-system || echo "namespace hdfs-system already exists"
helm install hdfs gradiant/hdfs --namespace hdfs-system
Once installation is complete, run in separate terminal a port-forward
command for us to be able to push/pull batch data.
kubectl port-forward -n hdfs-system svc/hdfs-httpfs 14000:14000
Install and configure hdfscli¶
In this example we will be using hdfscli Python library for interacting with HDFS. It supports both the WebHDFS (and HttpFS) API as well as Kerberos authentication (not covered by the example).
You can install it with
pip install hdfs==2.5.8
To be able to put input-data.txt
for our batch job into hdfs we need to configure the client
[11]:
%%writefile hdfscli.cfg
[global]
default.alias = batch
[batch.alias]
url = http://localhost:14000
user = hdfs
Overwriting hdfscli.cfg
Install Argo Workflows¶
You can follow the instructions from the official Argo Workflows Documentation.
You also need to make sure that argo has permissions to create seldon deployments - for this you can create a role:
[51]:
%%writefile role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: workflow
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- "*"
- apiGroups:
- "apps"
resources:
- deployments
verbs:
- "*"
- apiGroups:
- ""
resources:
- pods/log
verbs:
- "*"
- apiGroups:
- machinelearning.seldon.io
resources:
- "*"
verbs:
- "*"
Overwriting role.yaml
[52]:
!kubectl apply -f role.yaml
role.rbac.authorization.k8s.io/workflow unchanged
A service account:
[53]:
!kubectl create serviceaccount workflow
serviceaccount/workflow created
And a binding
[54]:
!kubectl create rolebinding workflow --role=workflow --serviceaccount=seldon:workflow
rolebinding.rbac.authorization.k8s.io/workflow created
Create Seldon Deployment¶
For purpose of this batch example we will assume that Seldon Deployment is created independently from the workflow logic
[55]:
%%writefile deployment.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: sklearn
namespace: seldon
spec:
name: iris
predictors:
- graph:
children: []
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/v1.19.0-dev/sklearn/iris
name: classifier
logger:
mode: all
name: default
replicas: 3
Overwriting deployment.yaml
[56]:
!kubectl apply -f deployment.yaml
seldondeployment.machinelearning.seldon.io/sklearn configured
[57]:
!kubectl -n seldon rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=sklearn -o jsonpath='{.items[0].metadata.name}')
deployment "sklearn-default-0-classifier" successfully rolled out
Create Input Data¶
[58]:
import os
import random
random.seed(0)
with open("input-data.txt", "w") as f:
for _ in range(10000):
data = [random.random() for _ in range(4)]
data = "[[" + ", ".join(str(x) for x in data) + "]]\n"
f.write(data)
[60]:
%%bash
HDFSCLI_CONFIG=./hdfscli.cfg hdfscli upload input-data.txt /batch-data/input-data.txt
Prepare HDFS config / client image¶
For connecting to the hdfs
from inside the cluster we will use the same hdfscli
tool as we used above to put data in there.
We will configure hdfscli
using hdfscli.cfg
file stored inside kubernetes secret:
[61]:
%%writefile hdfs-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: seldon-hdfscli-secret-file
type: Opaque
stringData:
hdfscli.cfg: |
[global]
default.alias = batch
[batch.alias]
url = http://hdfs-httpfs.hdfs-system.svc.cluster.local:14000
user = hdfs
Overwriting hdfs-config.yaml
[62]:
!kubectl apply -f hdfs-config.yaml
secret/seldon-hdfscli-secret-file configured
For the client image we will use a following minimal Dockerfile
[63]:
%%writefile Dockerfile
FROM python:3.8
RUN pip install hdfs==2.5.8
ENV HDFSCLI_CONFIG /etc/hdfs/hdfscli.cfg
Overwriting Dockerfile
That is build and published as seldonio/hdfscli:1.6.0-dev
Create Workflow¶
This simple workflow will consist of three stages: - download-input-data - process-batch-inputs - upload-output-data
[64]:
%%writefile workflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: sklearn-batch-job
namespace: seldon
labels:
deployment-name: sklearn
deployment-kind: SeldonDeployment
spec:
volumeClaimTemplates:
- metadata:
name: seldon-job-pvc
namespace: seldon
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
kind: Workflow
name: '{{workflow.name}}'
uid: '{{workflow.uid}}'
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
volumes:
- name: config
secret:
secretName: seldon-hdfscli-secret-file
arguments:
parameters:
- name: batch_deployment_name
value: sklearn
- name: batch_namespace
value: seldon
- name: input_path
value: /batch-data/input-data.txt
- name: output_path
value: /batch-data/output-data-{{workflow.name}}.txt
- name: batch_gateway_type
value: istio
- name: batch_gateway_endpoint
value: istio-ingressgateway.istio-system.svc.cluster.local
- name: batch_transport_protocol
value: rest
- name: workers
value: "10"
- name: retries
value: "3"
- name: data_type
value: data
- name: payload_type
value: ndarray
entrypoint: seldon-batch-process
templates:
- name: seldon-batch-process
steps:
- - arguments: {}
name: download-input-data
template: download-input-data
- - arguments: {}
name: process-batch-inputs
template: process-batch-inputs
- - arguments: {}
name: upload-output-data
template: upload-output-data
- name: download-input-data
script:
image: seldonio/hdfscli:1.6.0-dev
volumeMounts:
- mountPath: /assets
name: seldon-job-pvc
- mountPath: /etc/hdfs
name: config
readOnly: true
env:
- name: INPUT_DATA_PATH
value: '{{workflow.parameters.input_path}}'
- name: HDFSCLI_CONFIG
value: /etc/hdfs/hdfscli.cfg
command: [sh]
source: |
hdfscli download ${INPUT_DATA_PATH} /assets/input-data.txt
- name: process-batch-inputs
container:
image: seldonio/seldon-core-s2i-python37:1.19.0-dev
volumeMounts:
- mountPath: /assets
name: seldon-job-pvc
env:
- name: SELDON_BATCH_DEPLOYMENT_NAME
value: '{{workflow.parameters.batch_deployment_name}}'
- name: SELDON_BATCH_NAMESPACE
value: '{{workflow.parameters.batch_namespace}}'
- name: SELDON_BATCH_GATEWAY_TYPE
value: '{{workflow.parameters.batch_gateway_type}}'
- name: SELDON_BATCH_HOST
value: '{{workflow.parameters.batch_gateway_endpoint}}'
- name: SELDON_BATCH_TRANSPORT
value: '{{workflow.parameters.batch_transport_protocol}}'
- name: SELDON_BATCH_DATA_TYPE
value: '{{workflow.parameters.data_type}}'
- name: SELDON_BATCH_PAYLOAD_TYPE
value: '{{workflow.parameters.payload_type}}'
- name: SELDON_BATCH_WORKERS
value: '{{workflow.parameters.workers}}'
- name: SELDON_BATCH_RETRIES
value: '{{workflow.parameters.retries}}'
- name: SELDON_BATCH_INPUT_DATA_PATH
value: /assets/input-data.txt
- name: SELDON_BATCH_OUTPUT_DATA_PATH
value: /assets/output-data.txt
command: [seldon-batch-processor]
args: [--benchmark]
- name: upload-output-data
script:
image: seldonio/hdfscli:1.6.0-dev
volumeMounts:
- mountPath: /assets
name: seldon-job-pvc
- mountPath: /etc/hdfs
name: config
readOnly: true
env:
- name: OUTPUT_DATA_PATH
value: '{{workflow.parameters.output_path}}'
- name: HDFSCLI_CONFIG
value: /etc/hdfs/hdfscli.cfg
command: [sh]
source: |
hdfscli upload /assets/output-data.txt ${OUTPUT_DATA_PATH}
Overwriting workflow.yaml
[67]:
!argo submit --serviceaccount workflow workflow.yaml
Name: sklearn-batch-job
Namespace: seldon
ServiceAccount: workflow
Status: Pending
Created: Thu Jan 14 18:36:52 +0000 (now)
Progress:
Parameters:
batch_deployment_name: sklearn
batch_namespace: seldon
input_path: /batch-data/input-data.txt
output_path: /batch-data/output-data-{{workflow.name}}.txt
batch_gateway_type: istio
batch_gateway_endpoint: istio-ingressgateway.istio-system.svc.cluster.local
batch_transport_protocol: rest
workers: 10
retries: 3
data_type: data
payload_type: ndarray
[68]:
!argo list
NAME STATUS AGE DURATION PRIORITY
sklearn-batch-job Running 1s 1s 0
[72]:
!argo get sklearn-batch-job
Name: sklearn-batch-job
Namespace: seldon
ServiceAccount: workflow
Status: Running
Created: Thu Jan 14 18:36:52 +0000 (39 seconds ago)
Started: Thu Jan 14 18:36:52 +0000 (39 seconds ago)
Duration: 39 seconds
Progress: 1/2
ResourcesDuration: 1s*(100Mi memory),1s*(1 cpu)
Parameters:
batch_deployment_name: sklearn
batch_namespace: seldon
input_path: /batch-data/input-data.txt
output_path: /batch-data/output-data-{{workflow.name}}.txt
batch_gateway_type: istio
batch_gateway_endpoint: istio-ingressgateway.istio-system.svc.cluster.local
batch_transport_protocol: rest
workers: 10
retries: 3
data_type: data
payload_type: ndarray
STEP TEMPLATE PODNAME DURATION MESSAGE
● sklearn-batch-job seldon-batch-process
├───✔ download-input-data download-input-data sklearn-batch-job-2227322232 6s
└───● process-batch-inputs process-batch-inputs sklearn-batch-job-2877616693 29s
[75]:
!argo logs sklearn-batch-job
sklearn-batch-job-2877616693: 2021-01-14 18:37:05,000 - batch_processor.py:167 - INFO: Processed instances: 100
sklearn-batch-job-2877616693: 2021-01-14 18:37:05,417 - batch_processor.py:167 - INFO: Processed instances: 200
sklearn-batch-job-2877616693: 2021-01-14 18:37:06,213 - batch_processor.py:167 - INFO: Processed instances: 300
sklearn-batch-job-2877616693: 2021-01-14 18:37:06,642 - batch_processor.py:167 - INFO: Processed instances: 400
sklearn-batch-job-2877616693: 2021-01-14 18:37:06,974 - batch_processor.py:167 - INFO: Processed instances: 500
sklearn-batch-job-2877616693: 2021-01-14 18:37:07,278 - batch_processor.py:167 - INFO: Processed instances: 600
sklearn-batch-job-2877616693: 2021-01-14 18:37:07,628 - batch_processor.py:167 - INFO: Processed instances: 700
sklearn-batch-job-2877616693: 2021-01-14 18:37:08,378 - batch_processor.py:167 - INFO: Processed instances: 800
sklearn-batch-job-2877616693: 2021-01-14 18:37:09,003 - batch_processor.py:167 - INFO: Processed instances: 900
sklearn-batch-job-2877616693: 2021-01-14 18:37:09,337 - batch_processor.py:167 - INFO: Processed instances: 1000
sklearn-batch-job-2877616693: 2021-01-14 18:37:09,697 - batch_processor.py:167 - INFO: Processed instances: 1100
sklearn-batch-job-2877616693: 2021-01-14 18:37:10,014 - batch_processor.py:167 - INFO: Processed instances: 1200
sklearn-batch-job-2877616693: 2021-01-14 18:37:10,349 - batch_processor.py:167 - INFO: Processed instances: 1300
sklearn-batch-job-2877616693: 2021-01-14 18:37:10,843 - batch_processor.py:167 - INFO: Processed instances: 1400
sklearn-batch-job-2877616693: 2021-01-14 18:37:11,207 - batch_processor.py:167 - INFO: Processed instances: 1500
sklearn-batch-job-2877616693: 2021-01-14 18:37:11,562 - batch_processor.py:167 - INFO: Processed instances: 1600
sklearn-batch-job-2877616693: 2021-01-14 18:37:11,975 - batch_processor.py:167 - INFO: Processed instances: 1700
sklearn-batch-job-2877616693: 2021-01-14 18:37:12,350 - batch_processor.py:167 - INFO: Processed instances: 1800
sklearn-batch-job-2877616693: 2021-01-14 18:37:12,783 - batch_processor.py:167 - INFO: Processed instances: 1900
sklearn-batch-job-2877616693: 2021-01-14 18:37:13,139 - batch_processor.py:167 - INFO: Processed instances: 2000
sklearn-batch-job-2877616693: 2021-01-14 18:37:13,563 - batch_processor.py:167 - INFO: Processed instances: 2100
sklearn-batch-job-2877616693: 2021-01-14 18:37:13,928 - batch_processor.py:167 - INFO: Processed instances: 2200
sklearn-batch-job-2877616693: 2021-01-14 18:37:14,352 - batch_processor.py:167 - INFO: Processed instances: 2300
sklearn-batch-job-2877616693: 2021-01-14 18:37:14,699 - batch_processor.py:167 - INFO: Processed instances: 2400
sklearn-batch-job-2877616693: 2021-01-14 18:37:15,042 - batch_processor.py:167 - INFO: Processed instances: 2500
sklearn-batch-job-2877616693: 2021-01-14 18:37:15,701 - batch_processor.py:167 - INFO: Processed instances: 2600
sklearn-batch-job-2877616693: 2021-01-14 18:37:16,124 - batch_processor.py:167 - INFO: Processed instances: 2700
sklearn-batch-job-2877616693: 2021-01-14 18:37:16,748 - batch_processor.py:167 - INFO: Processed instances: 2800
sklearn-batch-job-2877616693: 2021-01-14 18:37:17,300 - batch_processor.py:167 - INFO: Processed instances: 2900
sklearn-batch-job-2877616693: 2021-01-14 18:37:17,904 - batch_processor.py:167 - INFO: Processed instances: 3000
sklearn-batch-job-2877616693: 2021-01-14 18:37:18,454 - batch_processor.py:167 - INFO: Processed instances: 3100
sklearn-batch-job-2877616693: 2021-01-14 18:37:18,823 - batch_processor.py:167 - INFO: Processed instances: 3200
sklearn-batch-job-2877616693: 2021-01-14 18:37:19,236 - batch_processor.py:167 - INFO: Processed instances: 3300
sklearn-batch-job-2877616693: 2021-01-14 18:37:19,586 - batch_processor.py:167 - INFO: Processed instances: 3400
sklearn-batch-job-2877616693: 2021-01-14 18:37:20,317 - batch_processor.py:167 - INFO: Processed instances: 3500
sklearn-batch-job-2877616693: 2021-01-14 18:37:20,948 - batch_processor.py:167 - INFO: Processed instances: 3600
sklearn-batch-job-2877616693: 2021-01-14 18:37:21,356 - batch_processor.py:167 - INFO: Processed instances: 3700
sklearn-batch-job-2877616693: 2021-01-14 18:37:21,851 - batch_processor.py:167 - INFO: Processed instances: 3800
sklearn-batch-job-2877616693: 2021-01-14 18:37:22,205 - batch_processor.py:167 - INFO: Processed instances: 3900
sklearn-batch-job-2877616693: 2021-01-14 18:37:22,553 - batch_processor.py:167 - INFO: Processed instances: 4000
sklearn-batch-job-2877616693: 2021-01-14 18:37:23,051 - batch_processor.py:167 - INFO: Processed instances: 4100
sklearn-batch-job-2877616693: 2021-01-14 18:37:23,557 - batch_processor.py:167 - INFO: Processed instances: 4200
sklearn-batch-job-2877616693: 2021-01-14 18:37:24,016 - batch_processor.py:167 - INFO: Processed instances: 4300
sklearn-batch-job-2877616693: 2021-01-14 18:37:24,350 - batch_processor.py:167 - INFO: Processed instances: 4400
sklearn-batch-job-2877616693: 2021-01-14 18:37:24,883 - batch_processor.py:167 - INFO: Processed instances: 4500
sklearn-batch-job-2877616693: 2021-01-14 18:37:25,295 - batch_processor.py:167 - INFO: Processed instances: 4600
sklearn-batch-job-2877616693: 2021-01-14 18:37:25,669 - batch_processor.py:167 - INFO: Processed instances: 4700
sklearn-batch-job-2877616693: 2021-01-14 18:37:26,055 - batch_processor.py:167 - INFO: Processed instances: 4800
sklearn-batch-job-2877616693: 2021-01-14 18:37:26,795 - batch_processor.py:167 - INFO: Processed instances: 4900
sklearn-batch-job-2877616693: 2021-01-14 18:37:27,462 - batch_processor.py:167 - INFO: Processed instances: 5000
sklearn-batch-job-2877616693: 2021-01-14 18:37:27,887 - batch_processor.py:167 - INFO: Processed instances: 5100
sklearn-batch-job-2877616693: 2021-01-14 18:37:28,332 - batch_processor.py:167 - INFO: Processed instances: 5200
sklearn-batch-job-2877616693: 2021-01-14 18:37:28,742 - batch_processor.py:167 - INFO: Processed instances: 5300
sklearn-batch-job-2877616693: 2021-01-14 18:37:29,069 - batch_processor.py:167 - INFO: Processed instances: 5400
sklearn-batch-job-2877616693: 2021-01-14 18:37:29,443 - batch_processor.py:167 - INFO: Processed instances: 5500
sklearn-batch-job-2877616693: 2021-01-14 18:37:29,840 - batch_processor.py:167 - INFO: Processed instances: 5600
sklearn-batch-job-2877616693: 2021-01-14 18:37:30,235 - batch_processor.py:167 - INFO: Processed instances: 5700
sklearn-batch-job-2877616693: 2021-01-14 18:37:30,578 - batch_processor.py:167 - INFO: Processed instances: 5800
sklearn-batch-job-2877616693: 2021-01-14 18:37:31,024 - batch_processor.py:167 - INFO: Processed instances: 5900
sklearn-batch-job-2877616693: 2021-01-14 18:37:31,381 - batch_processor.py:167 - INFO: Processed instances: 6000
sklearn-batch-job-2877616693: 2021-01-14 18:37:31,847 - batch_processor.py:167 - INFO: Processed instances: 6100
sklearn-batch-job-2877616693: 2021-01-14 18:37:32,239 - batch_processor.py:167 - INFO: Processed instances: 6200
sklearn-batch-job-2877616693: 2021-01-14 18:37:32,603 - batch_processor.py:167 - INFO: Processed instances: 6300
sklearn-batch-job-2877616693: 2021-01-14 18:37:33,080 - batch_processor.py:167 - INFO: Processed instances: 6400
sklearn-batch-job-2877616693: 2021-01-14 18:37:33,567 - batch_processor.py:167 - INFO: Processed instances: 6500
sklearn-batch-job-2877616693: 2021-01-14 18:37:34,043 - batch_processor.py:167 - INFO: Processed instances: 6600
sklearn-batch-job-2877616693: 2021-01-14 18:37:34,444 - batch_processor.py:167 - INFO: Processed instances: 6700
sklearn-batch-job-2877616693: 2021-01-14 18:37:34,812 - batch_processor.py:167 - INFO: Processed instances: 6800
sklearn-batch-job-2877616693: 2021-01-14 18:37:35,148 - batch_processor.py:167 - INFO: Processed instances: 6900
sklearn-batch-job-2877616693: 2021-01-14 18:37:35,519 - batch_processor.py:167 - INFO: Processed instances: 7000
sklearn-batch-job-2877616693: 2021-01-14 18:37:35,873 - batch_processor.py:167 - INFO: Processed instances: 7100
sklearn-batch-job-2877616693: 2021-01-14 18:37:36,278 - batch_processor.py:167 - INFO: Processed instances: 7200
sklearn-batch-job-2877616693: 2021-01-14 18:37:36,694 - batch_processor.py:167 - INFO: Processed instances: 7300
sklearn-batch-job-2877616693: 2021-01-14 18:37:37,061 - batch_processor.py:167 - INFO: Processed instances: 7400
sklearn-batch-job-2877616693: 2021-01-14 18:37:37,509 - batch_processor.py:167 - INFO: Processed instances: 7500
sklearn-batch-job-2877616693: 2021-01-14 18:37:37,865 - batch_processor.py:167 - INFO: Processed instances: 7600
sklearn-batch-job-2877616693: 2021-01-14 18:37:38,211 - batch_processor.py:167 - INFO: Processed instances: 7700
sklearn-batch-job-2877616693: 2021-01-14 18:37:38,590 - batch_processor.py:167 - INFO: Processed instances: 7800
sklearn-batch-job-2877616693: 2021-01-14 18:37:39,028 - batch_processor.py:167 - INFO: Processed instances: 7900
sklearn-batch-job-2877616693: 2021-01-14 18:37:39,419 - batch_processor.py:167 - INFO: Processed instances: 8000
sklearn-batch-job-2877616693: 2021-01-14 18:37:39,910 - batch_processor.py:167 - INFO: Processed instances: 8100
sklearn-batch-job-2877616693: 2021-01-14 18:37:40,532 - batch_processor.py:167 - INFO: Processed instances: 8200
sklearn-batch-job-2877616693: 2021-01-14 18:37:41,022 - batch_processor.py:167 - INFO: Processed instances: 8300
sklearn-batch-job-2877616693: 2021-01-14 18:37:41,436 - batch_processor.py:167 - INFO: Processed instances: 8400
sklearn-batch-job-2877616693: 2021-01-14 18:37:41,800 - batch_processor.py:167 - INFO: Processed instances: 8500
sklearn-batch-job-2877616693: 2021-01-14 18:37:42,238 - batch_processor.py:167 - INFO: Processed instances: 8600
sklearn-batch-job-2877616693: 2021-01-14 18:37:42,704 - batch_processor.py:167 - INFO: Processed instances: 8700
sklearn-batch-job-2877616693: 2021-01-14 18:37:43,079 - batch_processor.py:167 - INFO: Processed instances: 8800
sklearn-batch-job-2877616693: 2021-01-14 18:37:43,712 - batch_processor.py:167 - INFO: Processed instances: 8900
sklearn-batch-job-2877616693: 2021-01-14 18:37:44,075 - batch_processor.py:167 - INFO: Processed instances: 9000
sklearn-batch-job-2877616693: 2021-01-14 18:37:44,459 - batch_processor.py:167 - INFO: Processed instances: 9100
sklearn-batch-job-2877616693: 2021-01-14 18:37:44,806 - batch_processor.py:167 - INFO: Processed instances: 9200
sklearn-batch-job-2877616693: 2021-01-14 18:37:45,344 - batch_processor.py:167 - INFO: Processed instances: 9300
sklearn-batch-job-2877616693: 2021-01-14 18:37:45,764 - batch_processor.py:167 - INFO: Processed instances: 9400
sklearn-batch-job-2877616693: 2021-01-14 18:37:46,110 - batch_processor.py:167 - INFO: Processed instances: 9500
sklearn-batch-job-2877616693: 2021-01-14 18:37:46,547 - batch_processor.py:167 - INFO: Processed instances: 9600
sklearn-batch-job-2877616693: 2021-01-14 18:37:46,987 - batch_processor.py:167 - INFO: Processed instances: 9700
sklearn-batch-job-2877616693: 2021-01-14 18:37:47,371 - batch_processor.py:167 - INFO: Processed instances: 9800
sklearn-batch-job-2877616693: 2021-01-14 18:37:47,905 - batch_processor.py:167 - INFO: Processed instances: 9900
sklearn-batch-job-2877616693: 2021-01-14 18:37:48,289 - batch_processor.py:167 - INFO: Processed instances: 10000
sklearn-batch-job-2877616693: 2021-01-14 18:37:48,290 - batch_processor.py:168 - INFO: Total processed instances: 10000
sklearn-batch-job-2877616693: 2021-01-14 18:37:48,290 - batch_processor.py:116 - INFO: Elapsed time: 43.7087140083313
Pull output-data from hdfs¶
[77]:
%%bash
HDFSCLI_CONFIG=./hdfscli.cfg hdfscli download /batch-data/output-data-sklearn-batch-job.txt output-data.txt
[78]:
!head output-data.txt
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.49551509682202705, 0.4192462053867995, 0.08523869779117352]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 0.0, "batch_instance_id": "409ad222-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.14889581912569078, 0.40048258722097885, 0.45062159365333043]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 10.0, "batch_instance_id": "409d56e6-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.1859090109477526, 0.46433848375587844, 0.349752505296369]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 1.0, "batch_instance_id": "409ad68c-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.35453094061556073, 0.3866773326679568, 0.2587917267164825]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 3.0, "batch_instance_id": "409bb106-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.14218706541271167, 0.2726759160836421, 0.5851370185036463]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 13.0, "batch_instance_id": "409dabc8-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.15720251854631545, 0.3840752321558323, 0.45872224929785227]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 2.0, "batch_instance_id": "409b7362-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.1808891729172985, 0.32704139903027096, 0.49206942805243054]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 14.0, "batch_instance_id": "409dac86-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.14218974549047703, 0.41059890080264444, 0.4472113537068785]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 15.0, "batch_instance_id": "409de20a-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.2643002677975754, 0.44720843507174224, 0.28849129713068233]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 16.0, "batch_instance_id": "409dfe98-5697-11eb-90de-06ee7f9820ec"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.2975075875929912, 0.25439317776178244, 0.44809923464522644]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "409a3f56-5697-11eb-be58-06ee7f9820ec", "batch_index": 4.0, "batch_instance_id": "409c3a2c-5697-11eb-90de-06ee7f9820ec"}}}}
[79]:
!kubectl delete -f deployment.yaml
seldondeployment.machinelearning.seldon.io "sklearn" deleted
[80]:
!argo delete sklearn-batch-job
Workflow 'sklearn-batch-job' deleted