Getting Started with ML JupyterLab
Last updated
Was this helpful?
Last updated
Was this helpful?
ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .
This example demonstrates the use of ML JupyterLab for hyperparameter tuning, using a proteomics dataset derived from 68 COVID-19 patients. The data is obtained from the study by .
The Python environment of ML JupyterLab has state-of-the-art ML libraries preinstalled so that you don't have to install them yourselves.
If your data locates on DNAnexus, they can be loaded using the following syntax:
dnanexus://<PROJECT-ID>:/path/to/your/data
dnanexus://<PROJECT-ID>:<FILE-ID>
Behind the scene, ML JupyterLab uses to retrieve data via APIs provided by . Both packages are developed by DNAnexus.
Using DNAnexus URIs instead of physical paths makes your .ipynb
file much more portable. As long as your collegues has the permission to read the data, they can use your .ipynb
file immediately.
Once the data is successfully retrieved, you can perform data quality control (QC) using the dxprofiler package. This tool, developed by DNAnexus, provides an interactive dashboard to enabling efficient and comprehensive QC.
This section is for preparing the data frames for Data Profiler.
Once the processing is finished, you can launch the GUI of Data Profiler to assess the data. (Run the below code to load the illustrated images)
In this screen, we can see that the expression and sample are connected by the sampleID column. The Venn diagram indicates 68 samples shared between these tables. It tells us that there is no arbitrary ID in the data (aka no sample ID that only appears in one table).
In this second screen, we look more specifically into the Mild_ModvsSevere column of the sample table. There are 43 mild and 25 severe cases. And there is no missing value there. Looks like we are good to go forward.
Command to open the interactive Data Profiler GUI.
We will run hyperparameter tuning on a Support Vector Classifier (SVC) model with a Radial Basis Function (RBF) kernel.
Firstly, let's define our search space and model.
Before starting the hyperparameter tuning step, let set up an MLflow Experiment for model logging purpose later.
At this stage, you can start running with:
That is the standard way to do hyperparameter tuning. However, ML JupyterLab is deployed on a Ray cluster. This architecture can speed up your script several times depending on the number of nodes. To leverage the computing power of ML JupyterLab, simply put your script in a ray context.
Besides, in order to log the best model and its parameters, let start the MLflow run first.
The run has been logged into the MLflow Tracking Server. To check it, let open the DX MLFlow package on the ML JupyterLab Homepage, and access the COVID Severity Experiment.
To see how much faster ML JupyterLab can handle that step, let's measure the execution time.
As you can see, running with Ray in ML JupyterLab can speed up your script at least 2 to 3 times. With the combined computational power of multiple instances, ML JupyterLab creates a much more scalable workspace for AI/ML development compared to a single node. For instance, on a single node, you can use up to 128 cores (i.e. mem4_ssd1_x128
). With ML JupyterLab, you can easily obtain a workspace that goes beyond 128 cores.
Plus, everything will be running inside a secure and compliant environment!
You can also evaluate the model's performance without a separate test set. Let's create a SVC model with best parameters found in the previous steps.
Next, let's create a ROC curve from the prediction result.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.
As you can see from this quick showcase, using dxprofiler is a neat way to understand your dataset. The screens above are just a tiny fraction of what this package can do. If you are interested, please learn more at .