MLFlow Pre-packaged Model Server AB Test DeploymentΒΆ

In this example we will build two models with MLFlow and we will deploy them as an A/B test deployment. The reason this is powerful is because it allows you to deploy a new model next to the old one, distributing a percentage of traffic. These deployment strategies are quite simple using Seldon, and can be extended to shadow deployments, multi-armed-bandits, etc.

Tutorial OverviewΒΆ

This tutorial will follow closely break down in the following sections:

  1. Train the MLFlow elastic net wine example
  2. Deploy your trained model leveraging our pre-packaged MLFlow model server
  3. Test the deployed MLFlow model by sending requests
  4. Deploy your second model as an A/B test
  5. Visualise and monitor the performance of your models using Seldon Analytics

It will follow closely our talk at the Spark + AI Summit 2019 on Seldon and MLflow.


For this example to work you must be running Seldon 0.3.2 or above - you can follow our getting started guide for this.

In regards to other dependencies, make sure you have installed:

  • Helm v3.0.0+
  • kubectl v1.14+
  • Python 3.6+
  • MLFlow 1.1.0
  • pygmentize
  • tree

We will also take this chance to load the Python dependencies we will use through the tutorial:

import pandas as pd
import numpy as np
from seldon_core.seldon_client import SeldonClient

Let’s get started! πŸš€πŸ”₯ΒΆ

1. Train the first MLFlow Elastic Net Wine exampleΒΆ

For our example, we will use the elastic net wine example from MLflow’s tutorial.

As any other MLflow project, it is defined by its MLproject file:

!pygmentize -l yaml MLproject
name: mlflow-talk

conda_env: conda.yaml

      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python {alpha} {l1_ratio}"

We can see that this project uses Conda for the environment and that it’s defined in the conda.yaml file:

!pygmentize conda.yaml
name: mlflow-talk
  - defaults
  - python=3.6
  - scikit-learn=0.19.1
  - pip:
    - mlflow>=1.0

Lastly, we can also see that the training will be performed by the file, which receives two parameters alpha and l1_ratio:

# The data set used in this example is from
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

if __name__ == "__main__":

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
    data = pd.read_csv(wine_path)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42), train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

We will use the wine quality dataset. Let’s load it to see what’s inside:

data = pd.read_csv("wine-quality.csv")
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.0 0.27 0.36 20.7 0.045 45.0 170.0 1.0010 3.00 0.45 8.8 6
1 6.3 0.30 0.34 1.6 0.049 14.0 132.0 0.9940 3.30 0.49 9.5 6
2 8.1 0.28 0.40 6.9 0.050 30.0 97.0 0.9951 3.26 0.44 10.1 6
3 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6
4 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6

We’ve set up our MLflow project and our dataset is ready, so we are now good to start training. MLflow allows us to train our model with the following command:

$ mlflow run . -P alpha=... -P l1_ratio=...

On each run, mlflow will set up the Conda environment defined by the conda.yaml file and will run the training commands defined in the MLproject file.

!mlflow run . -P alpha=0.5 -P l1_ratio=0.5
2019/11/20 11:16:37 INFO mlflow.projects: === Created directory /tmp/tmpaok27ecp for downloading remote URIs passed to arguments of type 'path' ===
2019/11/20 11:16:37 INFO mlflow.projects: === Running command 'source /opt/miniconda3/bin/../etc/profile.d/ && conda activate mlflow-1ecba04797edb7e7f7212d429debd9b664c31651 1>&2 && python 0.5 0.5' in run with ID 'fbbb1fe4f9ef4b4faf370f8a946f7c60' ===
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.82224284975954
  MAE: 0.6278761410160691
  R2: 0.12678721972772689
2019/11/20 11:16:38 INFO mlflow.projects: === Run (ID 'fbbb1fe4f9ef4b4faf370f8a946f7c60') succeeded ===

Each of these commands will create a new run which can be visualised through the MLFlow dashboard as per the screenshot below.


Each of these models can actually be found on the mlruns folder:

!tree -L 1 mlruns/0
β”œβ”€β”€ 835f65ed47974d3fb3359e646b61a009
β”œβ”€β”€ fbbb1fe4f9ef4b4faf370f8a946f7c60
└── meta.yaml

2 directories, 1 file

Inside each of these folders, MLflow stores the parameters we used to train our model, any metric we logged during training, and a snapshot of our model. If we look into one of them, we can see the following structure:

!tree mlruns/0/$(ls mlruns/0 | head -1)
β”œβ”€β”€ artifacts
β”‚Β Β  └── model
β”‚Β Β      β”œβ”€β”€ conda.yaml
β”‚Β Β      β”œβ”€β”€ MLmodel
β”‚Β Β      └── model.pkl
β”œβ”€β”€ meta.yaml
β”œβ”€β”€ metrics
β”‚Β Β  β”œβ”€β”€ mae
β”‚Β Β  β”œβ”€β”€ r2
β”‚Β Β  └── rmse
β”œβ”€β”€ params
β”‚Β Β  β”œβ”€β”€ alpha
β”‚Β Β  └── l1_ratio
└── tags
    β”œβ”€β”€ mlflow.gitRepoURL
    β”œβ”€β”€ mlflow.project.backend
    β”œβ”€β”€ mlflow.project.entryPoint
    β”œβ”€β”€ mlflow.project.env
    β”œβ”€β”€ mlflow.source.git.commit
    β”œβ”€β”€ mlflow.source.git.repoURL
    β”œβ”€β”€ mlflow.source.type
    └── mlflow.user

5 directories, 18 files

In particular, we are interested in the MLmodel file stored under artifacts/model:

!pygmentize -l yaml mlruns/0/$(ls mlruns/0 | head -1)/artifacts/model/MLmodel
artifact_path: model
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.6.9
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.19.1
run_id: 835f65ed47974d3fb3359e646b61a009
utc_time_created: '2019-11-20 11:15:42.706884'

This file stores the details of how the model was stored. With this information (plus the other files in the folder), we are able to load the model back. Seldon’s MLflow server will use this information to serve this model.

Now we should upload our newly trained model into a public Google Bucket or S3 bucket. We have already done this to make it simpler, which you will be able to find at gs://seldon-models/mlflow/model-a.

2. Deploy your model using the Pre-packaged Moldel Server for MLFlowΒΆ

Now we can deploy our trained MLFlow model.

For this we have to create a Seldon definition of the model server definition, which we will break down further below.

We will be using the model we updated to our google bucket (gs://seldon-models/mlflow/elasticnet_wine), but you can use your model if you uploaded it to a public bucket.

Setup Seldon CoreΒΆ

Use the setup notebook to Setup Cluster with Ambassador Ingress and Install Seldon Core. Instructions also online.

!pygmentize mlflow-model-server-seldon-config.yaml
kind: SeldonDeployment
  name: mlflow-deployment
  name: mlflow-deployment
  - graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://seldon-models/mlflow/elasticnet_wine
      name: wines-classifier
    name: mlflow-deployment-dag
    replicas: 1

Once we write our configuration file, we are able to deploy it to our cluster by running it with our command

!kubectl apply -f mlflow-model-server-seldon-config.yaml created

Once it’s created we just wait until it’s deployed.

It will basically download the image for the pre-packaged MLFlow model server, and initialise it with the model we specified above.

You can check the status of the deployment with the following command:

!kubectl rollout status deployment.apps/mlflow-deployment-mlflow-deployment-dag-77efeb1
deployment "mlflow-deployment-mlflow-deployment-dag-77efeb1" successfully rolled out

Once it’s deployed, we should see a β€œsucccessfully rolled out” message above. We can now test it!

3. Test the deployed MLFlow model by sending requestsΒΆ

Now that our model is deployed in Kubernetes, we are able to send any requests.

We will first need the URL that is currently available through Ambassador.

If you are running this locally, you should be able to reach it through localhost, in this case we can use port 80.

!kubectl get svc | grep ambassador
ambassador                                                  NodePort      <none>        80:30080/TCP        23h
ambassador-admin                                            ClusterIP   <none>        8877/TCP            23h

Now we will select the first datapoint in our dataset to send to the model.

x_0 = data.drop(["quality"], axis=1).values[:1]
[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]

We can try sending a request first using curl:

!curl -X POST -H 'Content-Type: application/json' \
    -d "{'data': {'names': [], 'ndarray': [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]}}" \
  "meta": {
    "puid": "n7i76rf930auf7u7ulhig51bu5",
    "tags": {
    "routing": {
    "requestPath": {
      "wines-classifier": "seldonio/mlflowserver_rest:0.2"
    "metrics": []
  "data": {
    "names": [],
    "ndarray": [5.550530190667395]

We can also send the request by using our python client

from seldon_core.seldon_client import SeldonClient
import math
import numpy as np
import subprocess

HOST = "localhost" # Add the URL you found above
port = "80" # Make sure you use the port above
batch = x_0
payload_type = "ndarray"

sc = SeldonClient(
    gateway_endpoint=HOST + ":" + port)

client_prediction = sc.predict(

meta {
  puid: "cdjl6irq91taaavkam57g2eatu"
  requestPath {
    key: "wines-classifier"
    value: "seldonio/mlflowserver_rest:0.2"
data {
  ndarray {
    values {
      number_value: 5.550530190667395

4. Deploy your second model as an A/B testΒΆ

Now that we have a model in production, it’s possible to deploy a second model as an A/B test. Our model will also be an Elastic Net model but using a different set of parameters. We can easily train it by leveraging MLflow:

!mlflow run . -P alpha=0.75 -P l1_ratio=0.2
2019/11/20 11:38:36 INFO mlflow.projects: === Created directory /tmp/tmppr1ufom9 for downloading remote URIs passed to arguments of type 'path' ===
2019/11/20 11:38:36 INFO mlflow.projects: === Running command 'source /opt/miniconda3/bin/../etc/profile.d/ && conda activate mlflow-1ecba04797edb7e7f7212d429debd9b664c31651 1>&2 && python 0.75 0.2' in run with ID '18f9f8c5d6a249f28f024011dea10e23' ===
Elasticnet model (alpha=0.750000, l1_ratio=0.200000):
  RMSE: 0.8118203122913661
  MAE: 0.6244638140789723
  R2: 0.14878415499818187
2019/11/20 11:38:37 INFO mlflow.projects: === Run (ID '18f9f8c5d6a249f28f024011dea10e23') succeeded ===

As we did before, we will now need to upload our model to a cloud bucket. To speed things up, we already have done so and the second model is now accessible in gs://seldon-models/mlflow/model-b.

We will deploy our second model as an A/B test. In particular, we will redirect 20% of the traffic to the new model.

This can be done by simply adding a traffic attribute on our SeldonDeployment spec:

!pygmentize ab-test-mlflow-model-server-seldon-config.yaml
kind: SeldonDeployment
  name: mlflow-deployment
  name: mlflow-deployment
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/mlflow/model-a
        name: wines-classifier
      name: a-mlflow-deployment-dag
      replicas: 1
      traffic: 80
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/mlflow/model-b
        name: wines-classifier
      name: b-mlflow-deployment-dag
      replicas: 1
      traffic: 20

And similar to the model above, we only need to run the following to deploy it:

!kubectl apply -f ab-test-mlflow-model-server-seldon-config.yaml created

We can check that the models have been deployed and are running with the following command.

We should now see the β€œa-” model and the β€œb-” models.

!kubectl get pods
NAME                                                              READY   STATUS     RESTARTS   AGE
ambassador-5d97b7df6f-tkrhq                                       1/1     Running    0          24h
mlflow-deployment-a-mlflow-deployment-dag-77efeb1-56dd56dcpx54t   0/2     Init:0/1   0          6s
mlflow-deployment-b-mlflow-deployment-dag-77efeb1-86cb459drl7fw   0/2     Init:0/1   0          6s

5. Visualise and monitor the performance of your models using Seldon AnalyticsΒΆ

This section is optional, but by following the instructions you will be able to visualise the performance of both models as per the chart below.

In order for this example to work you need to install and run the Grafana Analytics package for Seldon Core.

For this we can access the URL with the command below, it will request an admin and password which by default are set to the following: * Username: admin * Password: password

You can access the grafana dashboard through the port provided below:

!kubectl get svc grafana-prom -o jsonpath='{.spec.ports[0].nodePort}'

Now that we have both models running in our Kubernetes cluster, we can analyse their performance using Seldon Core’s integration with Prometheus and Grafana. To do so, we will iterate over the training set (which can be found in wine-quality.csv), making a request and sending the feedback of the prediction.

Since the /feedback endpoint requires a reward signal (i.e. the higher the better), we will simulate one as:

\[\begin{split}R(x_{n}) = \begin{cases} \frac{1}{(y_{n} - f(x_{n}))^{2}} &, y_{n} \neq f(x_{n}) \\ 500 &, y_{n} = f(x_{n}) \end{cases}\end{split}\]

, where \(R(x_{n})\) is the reward for input point \(x_{n}\), \(f(x_{n})\) is our trained model and \(y_{n}\) is the actual value.

def _get_reward(y, y_pred):
    if y == y_pred:
        return 500

    return 1 / np.square(y - y_pred)

def _test_row(row):
    input_features = row[:-1]
    feature_names = input_features.index.to_list()
    X = input_features.values.reshape(1, -1)
    y = row[-1].reshape(1, -1)

    # Note that we are re-using the SeldonClient defined previously
    r = sc.predict(

    y_pred =
    reward = _get_reward(y, y_pred)

    return reward[0]

data.apply(_test_row, axis=1)
0        [4.949928760465064]
1         [2.33866485520918]
2       [16.671295276036165]
3       [11.360528710955778]
4       [10.762015969288063]
4893     [270.7374890482452]
4894    [1.8348875422756648]
4895     [3.872377496349884]
4896    [1.9544204216470193]
4897     [22.25374886390087]
Length: 4898, dtype: object

You should now be able to see Seldon’s pre-built Grafana dashboard.


In bottom of the dashboard you can see the following charts:

  • On the left: the requests per second, which shows the different traffic breakdown we specified.
  • On the center: the reward, where you can see how model a outperforms model b by a large margin.
  • On the right, the latency for each one of them.

You are able to add your own custom metrics, and try out other more complex deployments by following further guides at

[ ]: