You haved added seldon-server/kubernetes/bin to you shell PATH environment variable.
The entire set of steps can be executed by running one of the Kubernetes deployments in seldon-server/kubernetes/conf/examples/ml10m which should have been created when you followed the install and configuration steps.
There are two jobs:
ml10m-import-item-similarity.json : Downloads the data, and create an item-similarity model
ml10m-import-matrix-factorization.json : Downloads the data, and create an matrix factorization model
Start the chosen the kubernetes job, for example:
The job may take 10 or more minutes to run depending on the size and compute power of your Kubernetes cluster and the Spark cluster within it. You can check its status with kubectl get jobs -l job-name=ml10m-import.
This job will:
Download the movielens 10 million data. (N.B. make sure your Kubernetes cluster has access to the internet)
Create a new client “ml10m”, create the item meta-data schema and import user, and items.
Create a historical actions dataset from the movie ratings
Run a Spark job via Luigi to create a model
Setup a runtime scorer
You can then test the recommendations by doing:
The above gets recommendations based on a recent action history for user 625 being movie 50 which is “The Usual Suspects”. The result should look something like:
Below are the deatiled steps which can be found here.
Hack to ensure we have a namesever for external DNS (seems to be required for local Docker running of Kubernetes)
Download and unzip Movielens data
Convert to UTF-8 the item meta-data
Create Historical Data Files
Create item, user and action CSV files from raw data
We will use a item schema to hold the title of the movies, as show below
The steps to setup and import the data can be done via the Seldon CLI
Create a new ml10m client
Setup the schema using above JSON
Import the users and items as defined by CSV files
Create actions file from CSV file
Build a Recommendation Model
The script contains calls to build either a item-similarity model or a matrix factorization model using luigi:
Setup Runtime Scorer
We setup an appropriate runtime scorer depending on which model we created, for the item-similarity we would do: