Content Recommendation Steps
Offline Recommendation Model Creation
Seldon provides a variety of item recommendation models that can be created and makes it easy for new custom models to be added.
The current integrated models are:
Confguration is either passed on the command line to the offline jobs or set in zookeeper.
Offline Data Store
The Seldon modelling and data manipulation jobs assume a structure for the data storage. This structure allows easy integration into a production environment where models are created periodically, usually each day. The directory structure is of the form
e.g. for a matrix_factorization model created for client client1 on 27 Jan 2014 (unix epoch day 16461) would be
You can use a network file store, AWS S3 or soon HDFS for the actual store.
The jobs that require activity data will use a start day and a number of days to collect from the filesystem the data they need. They will gather data from folders of the form:
The output path will be of the form:
Configuration is held in zookeeper as JSON in nodes of the form:
All jobs usually have a set of basic parameters they need including
- inputFolder : the base folder on the local file system, S3 or HDFS of the data needed for the job
- outputFolder : the base folder on the local file system, S3 or HDFS where the output will be stored
- startDay : the day as unix epoch day number to start from
- days : the number of days to go back from startDay (inclusive) to collect data as input
- awskey : AWS key (only needed if using S3 for storage)
- awssecret : AWS secret (only needed if using S3 storage)
- itemType : restrict activity data to only these types of items (-1 is allow all)
- activate : whether to activate the model immediately in the Seldon Server so predictions can be provided