Utilizing MLflow in JupyterLab
Last updated
Was this helpful?
Last updated
Was this helpful?
AI/ML Accelerator - MLflow is specifically built to track your ML experiments on the DNAnexus platform environment via the ML JupyterLab (another app in the AI/ML Accelerator package) environment. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .
AI/ML Accelerator - MLflow is specifically built to track your ML experiments on the DNAnexus platform environment via the ML JupyterLab (another app in the AI/ML Accelerator package) environment. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .
The title of this JupyterLab notebook in the launcher is “MLflow Quickstart”
This notebook demonstrates how to log your models to DNAnexus platform storage using MLflow, and then use the logged models to predict on the new dataset.
This demonstration uses the scikit-learn framework on the Iris dataset. The required libraries are pre-installed in the ML JupyterLab environment, so you can directly import them without the need for installation.
In this step, we use the Breast Cancer dataset provided by scikit-learn. This dataset includes features extracted from breast cancer cell nuclei obtained from biopsy samples. There are 30 numeric features such as mean radius, mean texture, mean area, etc., and the target variable indicates whether the tumor is malignant (1) or benign (0).
To evaluate the model’s performance, we split the dataset into training and testing sets. 80% of the data is used for training, and 20% is reserved for testing.
In order to group any distinct runs of a particular project or idea together, we can define an Experiment that will group each iteration (runs) together. Defining a unique name that is relevant to what we’re working on helps with organization and reduces the amount of work (searching) to find our runs later on.
MLflow’s autologging feature automatically logs metrics, parameters, and models during training. Here, we enable it for scikit-learn, which ensures that relevant details about the training process are captured without manual intervention.
This step involves training a RandomForestClassifier, a popular ensemble learning method. The training process is encapsulated in an MLflow run to capture the details.
Once the model is logged, we register it in the MLflow Model Registry. This allows the model to be versioned and used across different environments.
In this step, we load the registered model from the MLflow Model Registry and use it to make predictions on new data.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.
To view the logged experiment, runs, and registered models, let open the MLflow Tracking Server GUI by accessing the ‘DX MLFlow’ on the JupyterLab Launcher. See the MLflow User Guide on the Academy Page () for more details (you’re already here, but this is what will be present on the ML JupyterLab Example at the Launcher).