Spam Classification Model (Sklearn)

  • Wrap a ML model for use as a prediction microservice in seldon-core
  • Run locally on Docker to test
  • Deploy on seldon-core running on k8s cluster

Train Locally

[20]:
import numpy as np
import pandas as pd
from sklearn.externals import joblib
from pathlib import Path
import string
from nltk.stem import SnowballStemmer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import pickle
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
model_path: Path=Path('./')
[14]:
data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.drop(["Unnamed: 2", "Unnamed: 3", "Unnamed: 4"], axis=1)
data = data.rename(columns={"v1":"class", "v2":"text"})
data.head()

def pre_process(text):
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = [word for word in text.split() if word.lower() not in stopwords.words('english')]
    words = ""
    for i in text:
            stemmer = SnowballStemmer("english")
            words += (stemmer.stem(i))+" "
    return words

features = data['text'].copy()
features = features.apply(pre_process)

vectorizer = TfidfVectorizer("english")
_features = vectorizer.fit_transform(features)
with open('Spam-Classifier/model/vectorizer.pkl', 'wb') as vect:
    pickle.dump(vectorizer, vect)
[15]:
vectorizer = joblib.load(model_path.joinpath('Spam-Classifier/model/vectorizer.pkl'))
train_x, test_x, train_y, test_y = train_test_split(_features, data['class'], test_size=0.3, random_state=0)
svc = SVC(kernel='sigmoid', gamma=1.0, probability=True)
svc.fit(train_x,train_y)
# save the model to disk
filename = 'Spam-Classifier/model/model.pkl'
pickle.dump(svc, open(filename, 'wb'))
[15]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1.0, kernel='sigmoid',
    max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
    verbose=False)
[16]:
clf = joblib.load(model_path.joinpath(filename))
[21]:
prediction = clf.predict(test_x)
accuracy_score(test_y,prediction)
[21]:
0.9730861244019139
[22]:
message = np.array(['click here to win the price'])
data = vectorizer.transform(message).todense()
probas = clf.predict_proba(data)
probas
[22]:
array([[0.0220629, 0.9779371]])
[53]:
clf.classes_
[53]:
array(['ham', 'spam'], dtype=object)

wrap each model component using s2i

[43]:
!s2i build Spam-Classifier/ seldonio/seldon-core-s2i-python3:0.7 spam-classifier:1.0.0.1
---> Installing application source...
---> Installing dependencies ...
Looking in links: /whl
Collecting scikit-learn==0.21.2 (from -r requirements.txt (line 1))
  Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/85/04/49633f490f726da6e454fddc8e938bbb5bfed2001681118d3814c219b723/scikit_learn-0.21.2-cp36-cp36m-manylinux1_x86_64.whl (6.7MB)
Requirement already satisfied: numpy>=1.9.2 in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.16.3)
Collecting scipy>=0.17.0 (from scikit-learn==0.21.2->-r requirements.txt (line 1))
  Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/29/50/a552a5aff252ae915f522e44642bb49a7b7b31677f9580cfd11bcc869976/scipy-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (25.2MB)
Collecting joblib>=0.11 (from scikit-learn==0.21.2->-r requirements.txt (line 1))
  Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/8f/42/155696f85f344c066e17af287359c9786b436b1bf86029bb3411283274f3/joblib-0.14.0-py2.py3-none-any.whl (294kB)
Installing collected packages: scipy, joblib, scikit-learn
Successfully installed joblib-0.14.0 scikit-learn-0.21.2 scipy-1.3.1
Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
You are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Build completed successfully
[49]:
!docker run --name "spam-classifier" -d --rm -p 5000:5000 spam-classifier:1.0.0.1
1b8159f67b7ddbd2de26833411303ebee8e08331097e28754f04688c1fb86d3c
[51]:
!curl -g http://localhost:5000/predict --data-urlencode 'json={"data": {"names": ["message"], "ndarray": ["click here to win the price"]}}'

{"data":{"ndarray":["0.9779371008528993","spam"]},"meta":{}}
[52]:
!docker rm spam-classifier --force
spam-classifier
[30]:
!s2i build Translator/ seldonio/seldon-core-s2i-python3:0.7 translator:1.0.0.1
---> Installing application source...
---> Installing dependencies ...
Looking in links: /whl
Collecting goslate (from -r requirements.txt (line 1))
  Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/39/0b/50af938a1c3d4f4c595b6a22d37af11ebe666246b05a1a97573e8c8944e5/goslate-1.5.1.tar.gz
Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.16.3)
Collecting futures (from goslate->-r requirements.txt (line 1))
  Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
Downloading https://files.pythonhosted.org/packages/05/80/f41cca0ea1ff69bce7e7a7d76182b47bb4e1a494380a532af3e8ee70b9ec/futures-3.1.1-py3-none-any.whl
Building wheels for collected packages: goslate
Running setup.py bdist_wheel for goslate: started
Running setup.py bdist_wheel for goslate: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/4f/7f/28/6f52271012a7649b54b1a7adaae329b4246bbbf9d1e4f6e51a
Successfully built goslate
Installing collected packages: futures, goslate
Successfully installed futures-3.1.1 goslate-1.5.1
Url '/whl' is ignored. It is either a non-existing path or lacks a specific scheme.
You are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Build completed successfully
[55]:
!docker run --name "eng-translator" -d --rm -p 5000:5000 translator:1.0.0.1
ca18617eed4ee5b12c1ce835d94a677007e3c095166b8e4e5d0f9fd164757814
[57]:
!curl -g http://localhost:5000/transform-input --data-urlencode 'json={"data": {"names": ["message"], "ndarray": ["Wie läuft dein Tag"]}}'
{"data":{"names":["message"],"ndarray":["How is your day"]},"meta":{}}
[58]:
!docker rm eng-translator --force
eng-translator

Assuming you have kubernetes cluster running and seldon-core installed, you can deploy your Machine Learning model using:

kubectl apply -f deploy.yaml

[ ]: