MLflow client compatibility
- Tier: Free, Premium, Ultimate
- Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated
Version history
- Introduced in GitLab 15.11.
- Generally available in GitLab 17.8.
MLflow is a popular open source tool for Machine Learning experiment tracking. GitLab Model experiment tracking and GitLab Model registry are compatible with the MLflow client. The setup requires minimal changes to existing code.
GitLab plays the role of a MLflow server. Running mlflow server
is not necessary.
Enable MLflow client integration
Prerequisites:
- A personal, project, or group access token with at least the Developer role and the
api
scope. - The project ID. To find the project ID:
- On the left sidebar, select Search or go to and find your project.
- Select Settings > General.
To use MLflow client compatibility from a local environment:
-
Set the tracking URI and token environment variables on the host that runs the code. This can be your local environment, CI pipeline, or remote host. For example:
export MLFLOW_TRACKING_URI="<your gitlab endpoint>/api/v4/projects/<your project id>/ml/mlflow" export MLFLOW_TRACKING_TOKEN="<your_access_token>"
-
If the training code contains the call to
mlflow.set_tracking_uri()
, remove it.
In the model registry, you can copy the tracking URI from the overflow menu in the top right by selecting the vertical ellipsis ({ellipsis_v}).
Model experiments
When running the training code, MLflow client can be used to create experiments, runs, models, model versions, log parameters, metrics, metadata, and artifacts on GitLab.
After experiments are logged, they are listed under /<your project>/-/ml/experiments
.
Runs are registered and can be explored by selecting an experiment, model, or model version.
Creating an experiment
import mlflow
# Create a new experiment
experiment_id = mlflow.create_experiment(name="<your_experiment>")
# Setting the active experiment also creates a new experiment if it doesn't exist.
mlflow.set_experiment(experiment_name="<your_experiment>")
Creating a run
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
Logging parameters and metrics
import mlflow
mlflow.set_experiment(experiment_name="<your_experiment>")
with mlflow.start_run():
# Parameter keys need to be unique in the scope of the run
mlflow.log_param(key="param_1", value=1)
# Metrics can be updated throughout the run
mlflow.log_metric(key="metrics_1", value=1)
mlflow.log_metric(key="metrics_1", value=2)
Logging artifacts
import mlflow
mlflow.set_experiment(experiment_name="<your_experiment>")
with mlflow.start_run():
# Plaintext text files can be logged as artifacts using `log_text`
mlflow.log_text('Hello, World!', artifact_file='hello.txt')
mlflow.log_artifact(
local_path='<local/path/to/file.txt>',
artifact_path='<optional relative path to log the artifact at>'
)
Logging models
Models can be logged using one of the supported MLflow Model flavors. Logging with a model flavor records the metadata, making it easier to manage, load, and deploy models across different tools and environments.
import mlflow
from sklearn.ensemble import RandomForestClassifier
mlflow.set_experiment(experiment_name="<your_experiment>")
with mlflow.start_run():
# Create and train a simple model
model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X_train, y_train)
# Log the model using MLflow sklearn mode flavour
mlflow.sklearn.log_model(model, artifact_path="")
Loading a run
Version history
- Introduced in GitLab 17.9.
You can load a run from the GitLab model registry to, for example, make predictions.
import mlflow
import mlflow.pyfunc
run_id = "<your_run_id>"
download_path = "models" # Local folder to download to
mlflow.pyfunc.load_model(f"runs:/{run_id}/", dst_path=download_path)
sample_input = [[1,0,3,4],[2,0,1,2]]
model.predict(data=sample_input)
Associating a run to a CI/CD job
Version history
- Introduced in GitLab 16.1.
- Changed to beta in GitLab 17.1.
If your training code is being run from a CI/CD job, GitLab can use that information to enhance run metadata. To associate a run to a CI/CD job:
-
In the Project CI variables, include the following variables:
-
MLFLOW_TRACKING_URI
:"<your gitlab endpoint>/api/v4/projects/<your project id>/ml/mlflow"
-
MLFLOW_TRACKING_TOKEN
:<your_access_token>
-
-
In your training code within the run execution context, add the following code snippet:
import os import mlflow with mlflow.start_run(run_name=f"Run {index}"): # Your training code # Start of snippet to be included if os.getenv('GITLAB_CI'): mlflow.set_tag('gitlab.CI_JOB_ID', os.getenv('CI_JOB_ID')) # End of snippet to be included
Model registry
You can also manage models and model versions by using the MLflow
client. Models are registered under /<your project>/-/ml/models
.
Models
Creating a model
from mlflow import MlflowClient
client = MlflowClient()
model_name = '<your_model_name>'
description = 'Model description'
model = client.create_registered_model(model_name, description=description)
Notes
-
create_registered_model
argumenttags
is ignored. -
name
must be unique within the project. -
name
cannot be the name of an existing experiment.
Fetching a model
from mlflow import MlflowClient
client = MlflowClient()
model_name = '<your_model_name>'
model = client.get_registered_model(model_name)
Updating a model
from mlflow import MlflowClient
client = MlflowClient()
model_name = '<your_model_name>'
description = 'New description'
client.update_registered_model(model_name, description=description)
Deleting a model
from mlflow import MlflowClient
client = MlflowClient()
model_name = '<your_model_name>'
client.delete_registered_model(model_name)
Logging runs to a model
Every model has an associated experiment with the same name prefixed by [model]
.
To log a run to the model, use the experiment passing the correct name:
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```0
### Model version
#### Creating a model version
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```1
If the version parameter is not passed, it will be auto-incremented from the latest uploaded
version. You can set the version by passing a tag during model version creation. The version
must follow [SemVer](https://semver.org/) format.
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```2
**Notes**
- Argument `run_id` is ignored. Every model version behaves as a run. Creating a mode version from a run is not yet supported.
- Argument `source` is ignored. GitLab will create a package location for the model version files.
- Argument `run_link` is ignored.
- Argument `await_creation_for` is ignored.
#### Updating a model
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```3
#### Fetching a model version
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```4
#### Getting latest versions of a model
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```5
**Notes**
- Argument `stages` is ignored.
- Versions are ordered by highest semantic version.
#### Loading a model version
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```6
#### Logging metrics and parameters to a model version
Every model version is also a run, allowing users to log parameters
and metrics. The run ID can either be found at the Model version page in GitLab,
or by using the MLflow client:
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```7
#### Logging artifacts to a model version
GitLab creates a package that can be used by the MLflow client to upload files.
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```8
Artifacts will then be available under `https/<your project>/-/ml/models/<model_id>/versions/<version_id>`.
#### Linking a model version to a CI/CD job
Similar to runs, it is also possible to link a model version to a CI/CD job:
```python
import mlflow
# Creating a run requires an experiment ID or an active experiment
mlflow.set_experiment(experiment_name="<your_experiment>")
# Runs can be created with or without a context manager
with mlflow.start_run() as run:
print(run.info.run_id)
# Your training code
with mlflow.start_run():
# Your training code
```9
## Supported MLflow client methods and caveats
GitLab supports these methods from the MLflow client. Other methods might be supported but were not
tested. More information can be found in the [MLflow Documentation](https://www.mlflow.org/docs/1.28.0/python_api/mlflow.html). The MlflowClient counterparts
of the methods below are also supported with the same caveats.
| Method | Supported | Version Added | Comments |
|--------------------------|-----------------|---------------|----------------------------------------------------------------------------------------------|
| `create_experiment` | Yes | 15.11 | |
| `get_experiment` | Yes | 15.11 | |
| `get_experiment_by_name` | Yes | 15.11 | |
| `delete_experiment` | Yes | 17.5 | |
| `set_experiment` | Yes | 15.11 | |
| `get_run` | Yes | 15.11 | |
| `delete_run` | Yes | 17.5 | |
| `start_run` | Yes | 15.11 | (16.3) If a name is not provided, the run receives a random nickname. |
| `search_runs` | Yes | 15.11 | (16.4) `experiment_ids` supports only a single experiment ID with order by column or metric. |
| `log_artifact` | Yes with caveat | 15.11 | (15.11) `artifact_path` must be empty. Does not support directories. |
| `log_artifacts` | Yes with caveat | 15.11 | (15.11) `artifact_path` must be empty. Does not support directories. |
| `log_batch` | Yes | 15.11 | |
| `log_metric` | Yes | 15.11 | |
| `log_metrics` | Yes | 15.11 | |
| `log_param` | Yes | 15.11 | |
| `log_params` | Yes | 15.11 | |
| `log_figure` | Yes | 15.11 | |
| `log_image` | Yes | 15.11 | |
| `log_text` | Yes with caveat | 15.11 | (15.11) Does not support directories. |
| `log_dict` | Yes with caveat | 15.11 | (15.11) Does not support directories. |
| `set_tag` | Yes | 15.11 | |
| `set_tags` | Yes | 15.11 | |
| `set_terminated` | Yes | 15.11 | |
| `end_run` | Yes | 15.11 | |
| `update_run` | Yes | 15.11 | |
| `log_model` | Partial | 15.11 | (15.11) Saves the artifacts, but not the model data. `artifact_path` must be empty. |
| `load_model` | Yes | 17.5 | |
Other MLflowClient methods:
| Method | Supported | Version added | Comments |
|---------------------------|------------------|---------------|--------------------------------------------------|
| `create_registered_model` | Yes with caveats | 16.8 | [See notes](#creating-a-model) |
| `get_registered_model` | Yes | 16.8 | |
| `delete_registered_model` | Yes | 16.8 | |
| `update_registered_model` | Yes | 16.8 | |
| `create_model_version` | Yes with caveats | 16.8 | [See notes](#creating-a-model-version) |
| `get_model_version` | Yes | 16.8 | |
| `get_latest_versions` | Yes with caveats | 16.8 | [See notes](#getting-latest-versions-of-a-model) |
| `update_model_version` | Yes | 16.8 | |
| `create_registered_model` | Yes | 16.8 | |
| `create_registered_model` | Yes | 16.8 | |
## Known issues
- The API GitLab supports is the one defined at MLflow version 2.7.1.
- MLflow client methods not listed above are not supported.
- During creation of experiments and runs, ExperimentTags are stored, even though they are not displayed.