Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • In-App Features of ML JupyterLab
  • Kernel
  • Pre-installed ML packages
  • fsspec-dnanexus
  • Environment Snapshots
  • Ray dashboard
  • Resources

Was this helpful?

Export as PDF
  1. AI/ ML Accelerator
  2. ML JupyterLab

In App Features

PreviousLaunching a ML JupyterLab JobNextGetting Started with ML JupyterLab

Last updated 2 months ago

Was this helpful?

ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .

The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see ). To be specialized for ML work, the app is added with multiple new features which are listed below.

In-App Features of ML JupyterLab

ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .

The ML JupyterLab maintains the core features of a DXJupyterLab environment (for detailed information, please see ). To be specialized for ML work, the app is added with multiple new features which are listed below.

Kernel

ML JupyterLab uses python 3.10.4 kernel. This version provides a stable and widely-used environment for data science and machine learning tasks. It is fully compatible with a wide range of data processing and machine learning libraries, including NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch, enabling efficient model development and experimentation.

Pre-installed ML packages

Each Docker image in Docker image for the Ray node is pre-installed with state-of-the-art Python packages for ML development. Below are the exclusive list of packages and their version.

Docker Image: buildspec-1.0/ray-2.32.0-py310-cpu

Packages

Version

dxpy

>=0.385.0

pandas

2.0.3

fsspec_dnanexus

0.2.6

modin

0.23.1

pycaret

3.3.2

xgboost

1.7.6

stabl

1.0.0

scikit-learn

1.4.2

lightgbm

4.5.0

tensorflow

2.12.1

torch

2.4.1

torchaudio

2.4.1

torchvision

0.19.1

Docker Image: buildspec-1.0/ray-2.32.0-py310-gpu-pytorch

Packages

Version

dxpy

>=0.385.0

pandas

2.0.3

fsspec_dnanexus

0.2.6

modin

0.23.1

pycaret

3.3.2

xgboost

1.7.6

stabl

1.0.0

scikit-learn

1.4.2

lightgbm

4.5.0

torch

2.5.1+cu118

torchaudio

2.5.1+cu118

torchvision

0.20.1+cu118

Docker Image: buildspec-1.0/ray-2.32.0-py310-gpu-tensorflow

Packages

Version

dxpy

>=0.385.0

pandas

2.0.3

fsspec_dnanexus

0.2.6

modin

0.23.1

pycaret

3.3.2

xgboost

1.7.6

stabl

1.0.0

scikit-learn

1.4.2

lightgbm

4.5.0

tensorflow

2.12.1

fsspec-dnanexus

fsspec-dnanexus is a pre-installed Python library on the ML JupyterLab that abstracts file system operations, providing the unified APIs for interacting with the DNAnexus project storage. It simplifies working with files on the DNAnexus project by direct access without the need for downloading data to the local storage of the JupyterLab environment.

Here is an example use of fsspec-dnanexus to read a .csv file from DNAnexus project

import pandas as pd

df = pd.read_csv("dnanexus://my-dx-project:/folder/data.csv")

Environment Snapshots

It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).

A ML JupyterLab session is run in a Docker container, and a session snapshot file is a tarball generated by saving the Docker container state (with the docker commit and docker save commands). Any installed packages and files created locally are saved to a snapshot file, with the exception of directories /home/dnanexus and /mnt/, which are not included. This file is then uploaded to the project to .Notebook_snapshots and can be passed as input the next time the app is started.

Ray dashboard

ML JupyterLab provides access to the Ray Dashboard, a powerful tool for monitoring and managing distributed applications built using Ray. The dashboard gives users real-time insights into their Ray clusters and distributed applications.

Key Features of the Ray Dashboard:

  • Cluster Monitoring: Get an overview of the state of your Ray cluster, including node health, task statuses, and resource utilization.

  • Task and Actor Management: Track the progress of tasks and actors across the cluster, enabling users to identify bottlenecks or performance issues.

  • Resource Utilization: Monitor how resources such as CPU, memory, and GPUs are being used by your distributed tasks.

  • Logs and Debugging: Access logs and other debugging tools to troubleshoot and optimize your workflows.

Accessing the Ray Dashboard:

To Use:

  1. Open the Launcher: This is the interface where applications and services are available to launch.

  2. Click the Ray Icon: Clicking the icon should automatically open the Ray Dashboard in a new tab on JupyterLab

  3. Explore the Dashboard: Once the dashboard is open, you can navigate through the various sections like Overview, Nodes, Actors, Tasks, etc.

Ray Dashboard Tab Overview:

Tab

What it does

Use case

Overview

This tab provides a high-level summary of the cluster. It shows how many nodes are active, the overall resource availability (CPU, GPU, memory), and how much of those resources are currently being utilized.

Quickly assess whether your Ray cluster has enough resources for your workload or if any nodes are experiencing bottlenecks.

Nodes

Lists all the nodes in the cluster along with their detailed resource usage (CPU, GPU, memory, disk).

Monitor individual nodes to check if any specific node is under or over-utilized.

Actors

Displays information about Ray actors. Actors are stateful tasks in Ray, and this tab shows their state, resource usage, and status.

Track the state of actors, such as which ones are pending, running, or completed. It also allows you to monitor memory leaks or inefficiencies in long-running actors.

Tasks

Provides detailed information about Ray tasks (jobs submitted to Ray). Tasks can have statuses such as pending, running, or finished.

Use this tab to track individual tasks in your application, especially when debugging issues related to task performance or failure.

Logs

Offers access to the logs for each node in the cluster.

Use the logs for debugging purposes, especially if your jobs are failing or there are issues with Ray workers.

Metrics

Displays various system and custom metrics related to your application. You can track CPU, memory, network usage, and custom metrics defined by your application.

Analyze metrics to evaluate performance and optimize the resource usage of your application.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select “Contact Support”

  3. Fill in the Subject and Message to submit a support ticket.

For the detailed usage, please refer to .

sales@dnanexus.com
here
sales@dnanexus.com
here
the Official PyPI page
Full Documentation