Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Profiling the Dataset
  • Open the Data Profiler GUI
  • Resources

Was this helpful?

Export as PDF
  1. AI/ ML Accelerator
  2. Data Profiler

Accessing Data Profiler in ML JupyterLab

If you also have access to the ML JupyterLab (another solution in the AI/ML Accelerator Package), Data Profiler can be seamlessly opened in the JupyterLab environment, offering an intuitive and interactive tool for profiling multiple datasets directly within one workspace.

To get started, simply open an ML JupyterLab notebook, load the dataset, and profile it.

Profiling the Dataset

The integrated version of Data Profiler in ML JupyterLab (dxprofiler) offers four methods for loading your datasets to profile the data:

  1. Loading the dataset by specifying a path to the local folder (in the ML JupyterLab job) which contains the .csv or .parquet files.

    import dxprofiler
    dataset = dxprofiler.profile_files(path_to_csv_or_parquet='/path/to/tables/', data_dictionary=None)
  2. Loading the dataset by a list of .csv or .parquet files.

    import dxprofiler
    dataset = dxprofiler.profile_files(path_to_csv_or_parquet=['/path/to/table1.csv', '/path/to/table2.csv'], data_dictionary=None)
  3. Loading the dataset by Pandas dataframes ('patient_df' and 'clinical_df')

    import dxprofiler
    dataset = dxprofiler.profile_dfs(dataframes={'patient_df': patient, 'clinical_df': clinical}, data_dictionary=None)
  4. Loading the dataset by a record object (DNAnexus Dataset or Cohort). "project-xxxx:record-yyyy" is the ID of your Apollo Dataset (or Cohort) on the DNAnexus platform.

import dxprofiler

dataset = dxprofiler.profile_cohort_record(record_id="project-xxxx:record-yyyy")

Open the Data Profiler GUI

Once you finish profiling the dataset, here is the command to open the Data Profiler GUI:

dataset.visualize()

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousExplorer ModeNextML JupyterLab

Last updated 3 months ago

Was this helpful?

Full Documentation