In App Features
ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via [email protected].
The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see here). To be specialized for ML work, the app is added with multiple new features which are listed below.
In-App Features of ML JupyterLab
ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via [email protected].
The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see here). To be specialized for ML work, the app is added with multiple new features which are listed below.
Kernel
ML JupyterLab uses python 3.10.4 kernel. This version provides a stable and widely-used environment for data science and machine learning tasks. It is fully compatible with a wide range of data processing and machine learning libraries, including NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch, enabling efficient model development and experimentation.
Pre-installed ML packages
Each Docker image in Docker image for the Ray node is pre-installed with state-of-the-art Python packages for ML development. Below are the exclusive list of packages and their version.
Docker Image: buildspec-1.0/ray-2.32.0-py310-cpu
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
tensorflow
2.12.1
torch
2.4.1
torchaudio
2.4.1
torchvision
0.19.1
Docker Image: buildspec-1.0/ray-2.32.0-py310-gpu-pytorch
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
torch
2.5.1+cu118
torchaudio
2.5.1+cu118
torchvision
0.20.1+cu118
Docker Image: buildspec-1.0/ray-2.32.0-py310-gpu-tensorflow
Packages
Version
dxpy
>=0.385.0
pandas
2.0.3
fsspec_dnanexus
0.2.6
modin
0.23.1
pycaret
3.3.2
xgboost
1.7.6
stabl
1.0.0
scikit-learn
1.4.2
lightgbm
4.5.0
tensorflow
2.12.1
fsspec-dnanexus
fsspec-dnanexus is a pre-installed Python library on the ML JupyterLab that abstracts file system operations, providing the unified APIs for interacting with the DNAnexus project storage. It simplifies working with files on the DNAnexus project by direct access without the need for downloading data to the local storage of the JupyterLab environment.
Here is an example use of fsspec-dnanexus to read a .csv file from DNAnexus project
import pandas as pd
df = pd.read_csv("dnanexus://my-dx-project:/folder/data.csv")
For the detailed usage, please refer to the Official PyPI page.
Environment Snapshots
It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).
A ML JupyterLab session is run in a Docker container, and a session snapshot file is a tarball generated by saving the Docker container state (with the docker commit and docker save commands). Any installed packages and files created locally are saved to a snapshot file, with the exception of directories /home/dnanexus and /mnt/, which are not included. This file is then uploaded to the project to .Notebook_snapshots and can be passed as input the next time the app is started.
Ray dashboard
ML JupyterLab provides access to the Ray Dashboard, a powerful tool for monitoring and managing distributed applications built using Ray. The dashboard gives users real-time insights into their Ray clusters and distributed applications.
Key Features of the Ray Dashboard:
Cluster Monitoring: Get an overview of the state of your Ray cluster, including node health, task statuses, and resource utilization.
Task and Actor Management: Track the progress of tasks and actors across the cluster, enabling users to identify bottlenecks or performance issues.
Resource Utilization: Monitor how resources such as CPU, memory, and GPUs are being used by your distributed tasks.
Logs and Debugging: Access logs and other debugging tools to troubleshoot and optimize your workflows.
Accessing the Ray Dashboard:
To Use:
Open the Homepage: This is the interface where applications and services are available to launch.
Click the Ray Dashboard Icon: Clicking the icon should automatically open the Ray Dashboard in a new tab on JupyterLab
Explore the Dashboard: Once the dashboard is open, you can navigate through the various sections like Overview, Nodes, Actors, Tasks, etc.
Ray Dashboard Tab Overview:
Below is the overview of each section of the Ray Dashboard. For more detailed information, please refer to the Ray Dashboard Documentation.
Section
What it Does
Use Case
Overview
Summarizes key information about the Ray cluster, including resource usage, number of active jobs, actors, and nodes.
Quickly check the overall health and activity of the Ray cluster directly within ML JupyterLab. Useful as a starting point for deeper diagnostics.
Jobs
Displays a list of submitted Ray jobs along with their status (running, succeeded, failed), runtime environment, and timestamps.
Track job execution in real time. Helps users debug failed jobs or confirm successful task completion without leaving JupyterLab.
Serve
Shows the status and configuration of Ray Serve deployments, including endpoints, replica counts, and health.
Monitor deployed machine learning models or APIs, check routing logic, and scale deployments to meet demand. Ideal for users serving models via Ray Serve.
Cluster
Provides details about each node in the Ray cluster: available resources, current usage, and node status.
View how resources (CPU, memory, GPU) are distributed across nodes. Useful for optimizing workload placement or identifying underperforming nodes.
Actor
Lists all active and historical Ray actors, showing their state, creation tasks, resource usage, and ownership.
Useful for debugging stateful components, such as streaming pipelines or long-lived agents, especially if something gets stuck or fails silently.
Metric
Presents system and application-level metrics, including CPU usage, memory, GPU, and custom user-defined metrics.
Visualize performance trends over time. Helps with performance tuning and detecting memory leaks or CPU bottlenecks.
Logs
Aggregates and displays logs from Ray drivers, workers, and components in a searchable and filterable view.
Essential for debugging errors, investigating crashes, or understanding unexpected behavior during execution. Users can check logs directly from JupyterLab.
Script Server - Execute a command on all workers
ML JupyterLab provides a built-in Script Server that allows you to interact directly with all computing clusters in your job via preconfigured scripts or arbitrary commands. This feature streamlines multiple cluster operations such as job submission, resource monitoring, file transfer, package installations, and environment management.
The integrated Script Server enables you to communicate effortlessly with computing clusters, no terminal access or complex commands required. With just a few clicks, you can:
Run parallel shell commands across nodes (via PDSH)
Restart and manage distributed systems like Ray clusters
Monitor or control cluster behavior in real time
How to Use the Script Server
1. Access the Script Server Panel:
Open the Homepage: This is the interface where applications and services are available to launch.
Click the Script Server icon: Clicking the icon should automatically open the Script Server in a new tab of the web browser.
2. Select A Script to Operate
Choose either "PDSH" or "Restart Ray Cluster" from the list in the left sidebar to execute your desired operation.
PDSH: to run any shell command simultaneously on multiple compute nodes. By default, the head node directory “/scratch” is mounted to all worker nodes.
Restart Ray Cluster: to restart all Ray processes across the cluster. Useful if the cluster becomes unresponsive or needs to be reinitialized after job failures.
Example Use Case - Install additional package on all workers using PDSH
# Install additional packages on all workers using PDSH
uv pip install --system 'accelerate>=0.11' 'skorch'
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?