Accessing Data Profiler in ML JupyterLab

If you also have access to the ML JupyterLab (another solution in the AI/ML Accelerator Package), Data Profiler can be seamlessly opened in the JupyterLab environment, offering an intuitive and interactive tool for profiling multiple datasets directly within one workspace.

To get started, simply open an ML JupyterLab notebook, load the dataset, and profile it.

Profiling the Dataset

The integrated version of Data Profiler in ML JupyterLab (dxprofiler) offers four methods for loading your datasets to profile the data:

  1. Loading the dataset by specifying a path to the local folder (in the ML JupyterLab job) which contains the .csv or .parquet files.

    import dxprofiler
    dataset = dxprofiler.profile_files(path_to_csv_or_parquet='/path/to/tables/', data_dictionary=None)
  2. Loading the dataset by a list of .csv or .parquet files.

    import dxprofiler
    dataset = dxprofiler.profile_files(path_to_csv_or_parquet=['/path/to/table1.csv', '/path/to/table2.csv'], data_dictionary=None)
  3. Loading the dataset by Pandas dataframes ('patient_df' and 'clinical_df')

    import dxprofiler
    dataset = dxprofiler.profile_dfs(dataframes={'patient_df': patient, 'clinical_df': clinical}, data_dictionary=None)
  4. Loading the dataset by a record object (DNAnexus Dataset or Cohort). "project-xxxx:record-yyyy" is the ID of your Apollo Dataset (or Cohort) on the DNAnexus platform.

import dxprofiler

dataset = dxprofiler.profile_cohort_record(record_id="project-xxxx:record-yyyy")

Open the Data Profiler GUI

Once you finish profiling the dataset, here is the command to open the Data Profiler GUI:

dataset.visualize()

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Last updated

Was this helpful?