Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • A Note on Data:
  • How to Navigate in Data Profiler
  • Navigator
  • Column Types
  • Resources

Was this helpful?

Export as PDF
  1. AI/ ML Accelerator
  2. Data Profiler

Utilizing Data Profiler Navigator

PreviousIntroduction to Data ProfilerNextDataset Level Screen

Last updated 2 months ago

Was this helpful?

A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via ).

A Note on Data:

The data used in this section of Academy documentation can be found here to download:

The citation for this synthetic dataset is:

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007.

How to Navigate in Data Profiler

Overall about Navigation

Data Profiler helps the user explore different levels of a dataset. There are 3 levels of a dataset in Data Profiler:

  • Dataset level: Show relationships between tables in the dataset and overview of all tables, columns in the dataset

  • Table level: Show statistics of one particular table. It can also join with another table to create a joint profile.

  • Column level: Show statistics of one particular column of a table. It can also combine with other columns in the same table to create a joint profile.

To navigate between these 3 levels, the user can select from a navigator on the left side of the application. Once an option of the navigator is selected, the content of the main interface will change accordingly.

The user interface of Data Profiler consists of a navigator (left, highlighted in blue), which controls the content of the main section (right, highlighted in green).

Navigator

Navigator controls the content on the main section of Data Profiler. The main component of the Navigator is a hierarchical structure of the dataset, called Data Hierarchy

All Tables

The top level of a Data Hierarchy is All Tables, indicating the dataset level. This level is selected by default.

Under All Tables are individual tables in the dataset. Each table has a number on the far right indicating the number of columns in the table.

Data Hierarchy

Once a table is selected, the Data Hierarchy will show all columns from that table. Each column has a colored tag indicating the column type.

Searching for Columns

Above the Data Hierarchy, the user can search for one or more columns. The Data Hierarchy will show tables that have at least one of the column names in the search list (OR logic).

Explorer Mode

At the bottom of the Navigator, the user can switch to an Explorer Mode to create charts on their own. The functionality of this mode is discussed in another section of this document.

The 📜 button shows the Inference Logs Screen that show details on the profiling process. This feature is in development.

Column Types

The type of a column in Data Profiler can be specified in a data_dictionary. If that information is not available, Data Profiler will infer the column type based on the content of the column.

Column type

Description

Example

string

A string column has free-text values. This is the default fallback type when Data Profiler fails to cast a column type.

Patient’s name; Patient’s ID

integer

An integer column has integer values.

Number of children

float

A float column has float values.

Weight; Height

datetime

A float column has float values. The default time zone is UTC.

Date of birth

unknown

The column is empty

Null (or empty) values are allowed in all column types and they do not affect how a column type is determined.

FAQs about Columns

  • In my data_dictionary, the type of column A is “integer”. After loading with Data Profiler, the application says column A is a “string” column. What happened?

  • There is at least one non-null arbitrary value in column A that cannot be cast to an integer. Therefore, the Data Profiler falls back to “string”.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select “Contact Support”

  3. Fill in the Subject and Message to submit a support ticket.

In Data Profiler, there are 4 column types. These types are consistent with the data types used in the via Data Model Loader on DNAnexus platform:

data ingestion step
Full Documentation
sales@dnanexus.com
https://synthea.mitre.org/downloads
https://doi.org/10.1016/j.ibmed.2020.100007