Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • What is ML JupyterLab?
  • Why use the ML JupyterLab?
  • Core features of ML JupyterLab
  • Resources

Was this helpful?

Export as PDF
  1. AI/ ML Accelerator
  2. ML JupyterLab

Introduction to ML JupyterLab

PreviousML JupyterLabNextLaunching a ML JupyterLab Job

Last updated 2 months ago

Was this helpful?

ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via .

What is ML JupyterLab?

ML JupyterLab is an exclusive version of JupyterLab on DNAnexus that is designed for machine learning (ML) development on clinical and multi omics data. It retains the core features of JupyterLab, such as a web-based interactive development environment for notebooks, code, and data, while adding specific enhancements for ML, data science, and distributed computing. This module empowers users to work efficiently with large datasets using Ray distributed engines and seamless integration of ML libraries.

Why use the ML JupyterLab?

ML JupyterLab is the ideal environment for data scientists, researchers, and engineers working on complex ML workflows, large-scale datasets, and distributed computing tasks. Key benefits include:

  1. Ease of Setup: With the pre-configured ML environment, the app eliminates the need for manual installation of ML libraries and tools, reducing setup time.

  2. Scalability for Large Datasets: ML JupyterLab enables users to process massive datasets across distributed clusters utilizing the Ray engine, making it suitable for high-demand ML workloads.

  3. Resource monitor enablement: with Ray dashboards, users are able to track the performance of the jobs to optimize cluster capacity.

  4. Simplified Dependency Installation: Installing and managing libraries is straightforward with automatic detection and resolution of conflicts. This enables users to easily add or update ML libraries without concerns about dependency issues.

  5. Portability: With ML JupyterLab, users are able to use the data stored on the DNAnexus projects without being downloaded to the instance. The app allows users to run an ML workflow easily in different projects.

  6. Security and Compliance: ML JupyterLab is built on the DNAnexus platform environment with high levels of security and compliance, making it a trusted solution for industries like healthcare and life science.

  7. Efficient Collaboration: Users can save and share environment configurations, allowing easy replication of workspaces across projects and teams. This saves time and ensures consistency in ML workflows.

Core features of ML JupyterLab

  1. Distributed Engines: ML JupyterLab integrates Ray as the distributed computing engine, allowing users to scale their ML workflows across multiple nodes. This enables efficient handling of large datasets and complex computations, streamlining distributed ML tasks.

  2. Preinstalled Popular ML Packages: ML JupyterLab provides built-in support for popular ML libraries such as Scikit-learn, transformers, XGBoost, LightGBM, TensorFlow, and PyTorch. These preinstalled packages ensure that users have access to the latest tools for building, training, and deploying ML models without needing to install or manage dependencies manually.

  3. Seamless Package Installation and Dependency Management: In addition to preinstalled ML libraries, users can easily install new packages within ML JupyterLab. The environment automatically detects and resolves any dependency conflicts, providing a smooth experience when adding or updating libraries for specific projects. This feature ensures that users can customize their workspace effortlessly without breaking existing configurations.

  4. Save and Share Environment Configurations: Users can save their environment configurations (via a custom environment file), and this file can be shared across different DNAnexus projects or with team members, enabling quick replication of environments for new projects. This feature helps maintain consistency across teams and reduces setup time.

  5. Integrated Data Profiler: With the license for Data Profiler, users can launch this app inside a ML JupyterLab notebook via the dxprofiler package. This profiler provides essential statistics such as missing and duplication rates, data distributions, and correlations, allowing users to gain insights into their datasets quickly without requiring additional tools.

  6. Large-Scale Data Processing: ML JupyterLab leverages Modin, a parallel dataframe library compatible with pandas, to efficiently process large-scale datasets. It automatically distributes dataframe operations across the Ray cluster, allowing users to handle large datasets without modifying their existing pandas code.

  7. Directly Access Data from DNAnexus Project: Users can read and write data stored in DNAnexus projects directly from ML JupyterLab using fsspec-dnanexus. This feature provides a smoother workflow for handling large datasets and ensures seamless integration with DNAnexus platform.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select “Contact Support”

  3. Fill in the Subject and Message to submit a support ticket.

sales@dnanexus.com
Full Documentation