MLflow provides a convenient way to serve ML models with a single shell command:
mlflow models serve <path_to_the_model>
It installs model dependencies and starts inferences server with a REST API endpoint for inference called /invocations.
Let's build a simple XGBoost model and try to make a few predictions using MLflow.
We'll be using Iris dataset from scikit-learn following XGBoost Python example.
First, dependencies need to be installed:
pip install mlflow==2.14.2 xgboost==2.1.0 scikit-learn==1.5.1
Let's load and prepare training and test datasets:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Prepare training dataset
dataset = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
dataset["data"], dataset["target"], test_size=0.2, random_state=42
)
# Print dataset (Scikit Learn Bunch object) information
print("Shape of data:", dataset.data.shape)
print("Feature names:", list(dataset.feature_names))
print("Shape of target:", dataset.target.shape)
print("Target names:", list(dataset.target_names))
# Print training and test data shapes
for name in "X_train, X_test, y_train, y_test".split(", "):
print(f"{name}.shape: {locals()[name].shape}")
Should produce an output:
Shape of data: (150, 4)
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Shape of target: (150,)
Target names: ['setosa', 'versicolor', 'virginica']
X_train.shape: (120, 4)
X_test.shape: (30, 4)
y_train.shape: (120,)
y_test.shape: (30,)
The dataset has 4 features (sepal length, sepal width, petal length, petal width) and 3 classes of Iris flowers (setosa, versicolor, virginica).
Now let's train XGBoost model:
xgb_model = XGBClassifier(n_estimators=2, max_depth=2, learning_rate=1, objective="binary:logistic")
xgb_model.fit(X_train, y_train)
The model can be used directly to predict the flower classes:
xgb_model.predict(X_test)
It outputs 1D numpy array with the classes:
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0])
However, sometimes it's essential to have probabilities of classification and XGBoost provides predict_proba method:
# predict probabilities of flower classes
xgb_model.predict_proba(X_test)
It outputs a 2D numpy array of the shape (30, 3) with the probabilities for each class:
array([[0.04561405, 0.9090471 , 0.04533876],
[0.92584455, 0.03829525, 0.03586026],
[0.03954182, 0.04928818, 0.91116995],
[0.04473383, 0.89150536, 0.06376083],
[0.04222472, 0.841501 , 0.11627424],
[0.92584455, 0.03829525, 0.03586026],
[0.04561405, 0.9090471 , 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.04473383, 0.89150536, 0.06376083],
[0.04561405, 0.9090471 , 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.04473383, 0.89150536, 0.06376083],
[0.03954182, 0.04928818, 0.91116995],
[0.04561405, 0.9090471 , 0.04533876],
[0.04561405, 0.9090471 , 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.04453262, 0.14168864, 0.81377876],
[0.92584455, 0.03829525, 0.03586026],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026]], dtype=float32)
Now let's save XGBoost model with MLflow:
import mlflow.xgboost
mlflow.xgboost.save_model(xgb_model, 'mlflow_xgb_iris_model')
MLflow creates mlflow_xgb_iris_model directory with these files:
- MLmodel - MLflow model manifest
- conda.yaml - Conda requirements for the model
- model.xgb - the model itself
- python_env.yaml - Python environment specification
- requirements.txt - Python requirements used in python_env.yaml
The model can be served using MLflow CLI command:
mlflow models serve -m mlflow_xgb_iris_model --env-manager local
2024/07/13 11:39:32 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2024/07/13 11:39:32 INFO mlflow.pyfunc.backend: === Running command 'exec gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2024-07-13 11:39:32 +0200] [32726] [INFO] Starting gunicorn 22.0.0
[2024-07-13 11:39:32 +0200] [32726] [INFO] Listening at: http://127.0.0.1:5000 (32726)
[2024-07-13 11:39:32 +0200] [32726] [INFO] Using worker: sync
[2024-07-13 11:39:32 +0200] [32728] [INFO] Booting worker with pid: 32728
--env-manager local argument is used to prevent MLflow from installing model dependencies, since they are already present in the virtual environment.
Now let's use X_test values to make request to MLflow inference API:
import numpy as np # just for a better representation of result
import requests
MODEL_ENDPOINT = "http://127.0.0.1:5000/invocations"
response = requests.post(MODEL_ENDPOINT, json={"inputs": X_test.tolist()})
np.array(response.json()["predictions"])
Returned data contains predicted classes for X_test values:
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0])
What if probabilities need to be calculated?
mlflow.xgboost model flavor calls only predict method and does not support predict_proba method.
That's why mlflow-xgboost-proba Python package has been created.
It implements mlflow_xgboost_proba MLflow flavour, which allows to run predict_proba method:
pip install mlflow-xgboost-proba==0.3.1
import mlflow_xgboost_proba
mlflow_xgboost_proba.save_model(xgb_model, 'mlflow_xgb_proba_iris_model')
Let's run inference of the new model flavor with MLflow on a port 5001 (instead of default 5000):
mlflow models serve -m mlflow_xgb_proba_iris_model --env-manager local --port 5001
and make API request again:
import numpy as np
import requests
MODEL_ENDPOINT = "http://127.0.0.1:5001/invocations"
response = requests.post(MODEL_ENDPOINT, json={"inputs": X_test.tolist()})
np.array(response.json()["predictions"])
This time probabilities of the classes are returned:
array([[0.04561405, 0.90904713, 0.04533876],
[0.92584455, 0.03829525, 0.03586026],
[0.03954182, 0.04928818, 0.91116995],
[0.04473383, 0.89150536, 0.06376083],
[0.04222472, 0.841501 , 0.11627424],
[0.92584455, 0.03829525, 0.03586026],
[0.04561405, 0.90904713, 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.04473383, 0.89150536, 0.06376083],
[0.04561405, 0.90904713, 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026],
[0.04473383, 0.89150536, 0.06376083],
[0.03954182, 0.04928818, 0.91116995],
[0.04561405, 0.90904713, 0.04533876],
[0.04561405, 0.90904713, 0.04533876],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.04453262, 0.14168864, 0.81377876],
[0.92584455, 0.03829525, 0.03586026],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.03954182, 0.04928818, 0.91116995],
[0.92584455, 0.03829525, 0.03586026],
[0.92584455, 0.03829525, 0.03586026]])