In App Features
Last updated
Was this helpful?
Last updated
Was this helpful?
ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .
The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see ). To be specialized for ML work, the app is added with multiple new features which are listed below.
ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .
The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see ). To be specialized for ML work, the app is added with multiple new features which are listed below.
ML JupyterLab uses python 3.10.4 kernel. This version provides a stable and widely-used environment for data science and machine learning tasks. It is fully compatible with a wide range of data processing and machine learning libraries, including NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch, enabling efficient model development and experimentation.
Each Docker image in Docker image for the Ray node is pre-installed with state-of-the-art Python packages for ML development. Below are the exclusive list of packages and their version.
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
tensorflow
2.12.1
torch
2.4.1
torchaudio
2.4.1
torchvision
0.19.1
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
torch
2.5.1+cu118
torchaudio
2.5.1+cu118
torchvision
0.20.1+cu118
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
tensorflow
2.12.1
fsspec-dnanexus is a pre-installed Python library on the ML JupyterLab that abstracts file system operations, providing the unified APIs for interacting with the DNAnexus project storage. It simplifies working with files on the DNAnexus project by direct access without the need for downloading data to the local storage of the JupyterLab environment.
Here is an example use of fsspec-dnanexus to read a .csv file from DNAnexus project
import pandas as pd
df = pd.read_csv("dnanexus://my-dx-project:/folder/data.csv")
It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).
A ML JupyterLab session is run in a Docker container, and a session snapshot file is a tarball generated by saving the Docker container state (with the docker commit and docker save commands). Any installed packages and files created locally are saved to a snapshot file, with the exception of directories /home/dnanexus and /mnt/, which are not included. This file is then uploaded to the project to .Notebook_snapshots and can be passed as input the next time the app is started.
ML JupyterLab provides access to the Ray Dashboard, a powerful tool for monitoring and managing distributed applications built using Ray. The dashboard gives users real-time insights into their Ray clusters and distributed applications.
Key Features of the Ray Dashboard:
Cluster Monitoring: Get an overview of the state of your Ray cluster, including node health, task statuses, and resource utilization.
Task and Actor Management: Track the progress of tasks and actors across the cluster, enabling users to identify bottlenecks or performance issues.
Resource Utilization: Monitor how resources such as CPU, memory, and GPUs are being used by your distributed tasks.
Logs and Debugging: Access logs and other debugging tools to troubleshoot and optimize your workflows.
Accessing the Ray Dashboard:
To Use:
Open the Launcher: This is the interface where applications and services are available to launch.
Click the Ray Icon: Clicking the icon should automatically open the Ray Dashboard in a new tab on JupyterLab
Explore the Dashboard: Once the dashboard is open, you can navigate through the various sections like Overview, Nodes, Actors, Tasks, etc.
Ray Dashboard Tab Overview:
Tab
What it does
Use case
Overview
This tab provides a high-level summary of the cluster. It shows how many nodes are active, the overall resource availability (CPU, GPU, memory), and how much of those resources are currently being utilized.
Quickly assess whether your Ray cluster has enough resources for your workload or if any nodes are experiencing bottlenecks.
Nodes
Lists all the nodes in the cluster along with their detailed resource usage (CPU, GPU, memory, disk).
Monitor individual nodes to check if any specific node is under or over-utilized.
Actors
Displays information about Ray actors. Actors are stateful tasks in Ray, and this tab shows their state, resource usage, and status.
Track the state of actors, such as which ones are pending, running, or completed. It also allows you to monitor memory leaks or inefficiencies in long-running actors.
Tasks
Provides detailed information about Ray tasks (jobs submitted to Ray). Tasks can have statuses such as pending, running, or finished.
Use this tab to track individual tasks in your application, especially when debugging issues related to task performance or failure.
Logs
Offers access to the logs for each node in the cluster.
Use the logs for debugging purposes, especially if your jobs are failing or there are issues with Ray workers.
Metrics
Displays various system and custom metrics related to your application. You can track CPU, memory, network usage, and custom metrics defined by your application.
Analyze metrics to evaluate performance and optimize the resource usage of your application.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.
For the detailed usage, please refer to .