Utilizing MLflow in JupyterLab

AI/ML Accelerator - MLflow is specifically built to track your ML experiments on the DNAnexus platform environment via the ML JupyterLab (another app in the AI/ML Accelerator package) environment. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via [email protected].

JupyterLab Example: MLflow Quickstart

AI/ML Accelerator - MLflow is specifically built to track your ML experiments on the DNAnexus platform environment via the ML JupyterLab (another app in the AI/ML Accelerator package) environment. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via [email protected].

The title of this JupyterLab notebook in the launcher is “MLflow Quickstart”

Getting Started with MLflow on DNAnexus

This notebook demonstrates how to log your models to DNAnexus platform storage using MLflow, and then use the logged models to predict on the new dataset.

Importing Required Libraries

This demonstration uses the scikit-learn framework on the Iris dataset. The required libraries are pre-installed in the ML JupyterLab environment, so you can directly import them without the need for installation.

import mlflow
import mlflow.sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import ray
from ray.util.joblib import register_ray
from joblib import parallel_backend

Data Preparation

In this step, we use the Breast Cancer dataset provided by scikit-learn. This dataset includes features extracted from breast cancer cell nuclei obtained from biopsy samples. There are 30 numeric features such as mean radius, mean texture, mean area, etc., and the target variable indicates whether the tumor is malignant (1) or benign (0).

To evaluate the model’s performance, we split the dataset into training and testing sets. 80% of the data is used for training, and 20% is reserved for testing.

Define an MLflow Experiment

In order to group any distinct runs of a particular project or idea together, we can define an Experiment that will group each iteration (runs) together. Defining a unique name that is relevant to what we’re working on helps with organization and reduces the amount of work (searching) to find our runs later on.

Enable MLflow Autologging

MLflow’s autologging feature automatically logs metrics, parameters, and models during training. Here, we enable it for scikit-learn, which ensures that relevant details about the training process are captured without manual intervention.

Train the model and log with MLflow

This step involves training a RandomForestClassifier, a popular ensemble learning method. The training process is encapsulated in an MLflow run to capture the details.

Register the model

Once the model is logged, we register it in the MLflow Model Registry. This allows the model to be versioned and used across different environments.

To view the logged experiment, runs, and registered models, let open the MLflow Tracking Server GUI by accessing the ‘DX MLFlow’ on the JupyterLab Launcher. See the MLflow User Guide on the Academy Page (https://academy.dnanexus.com/) for more details (you’re already here, but this is what will be present on the ML JupyterLab Example at the Launcher).

Load the registered model and make predictions

In this step, we load the registered model from the MLflow Model Registry and use it to make predictions on new data.

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select “Contact Support”

  3. Fill in the Subject and Message to submit a support ticket.

Last updated

Was this helpful?