Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • A Note on Data:
  • Table Level Screen
  • Table Overview
  • Composition of Column Types
  • Table-level charts
  • Completeness
  • Resources

Was this helpful?

Export as PDF
  1. AI/ ML Accelerator
  2. Data Profiler

Table Level Screen

PreviousDataset Level ScreenNextColumn Level Screen

Last updated 2 months ago

Was this helpful?

A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via ).

A Note on Data:

The data used in this section of Academy documentation can be found here to download:

The citation for this synthetic dataset is:

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Syntheaâ„¢ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007.

Table Level Screen

The Table-level screen appears when the user selects one particular table in the Navigator.

Table-level Screen of a table in Data Profiler

Table Overview

Overview details on the header of the Table-level screen

On the header of the Table-level screen, the user can find overall statistics on the selected table, that include:

  • Table size: number of rows and columns of the table

  • Missing rate: the rate of empty cells in the table

  • Duplicate rate: the rate of duplication of an entire row in the table

Composition of Column Types

Pie chart of Column types on the header of the Table-level screen

The pie chart shows the composition of column types in the table. The size of each part of the pie is determined by the number of columns of that type. The user can also hover on the chart to get the count value.

Table-level charts

Table-level screen has a Controller section that configures the visualization in the Chart area

The main function of the Table-level Screen is the Chart Area, which is controlled by a Controller in the top right corner of the screen. There are 2 main types of visualizations: Completeness and Column Profiles.

Completeness

Completeness is the default mode of the Table-level screen. It aims to provide an overview on the count/rate of non-null values in a table. Completeness has 2 options: One-way view and Two-way view

One-way View: Bar chart

One-way view in Table-level screen

One-way view is a stacked bar chart that displays the percentage of missing values, non-duplicates, and duplicates for each column in the table. You can click on the Legend/Key to show or hide specific statistics on the chart. Hover over each column to view detailed statistics.

Two-way View: Heat map

Two-way view in Table-level screen

Two-way view is a heat map showing data completeness for all columns in the table. The Y-axis of the heatmap is the columns of the table. The X-axis of the heatmap is the unique values of the group-by column. The value of the heatmap shows how many entities (in the Raw count mode, or percentage in the Percentage mode) of the table have non-null values on the columns (y-axis) with respect to the value of the group-by column (x-axis). . The user can choose another column as the grouping factor. Each label in this Group-by column is a column in the heat map. Only categorical columns which have a maximum of 30 unique values will show up as the options.

The Controller of Two-way view

The numbers in the heat map can be configured in two ways:

  • Raw count displays the exact number of values available in each column.

  • Percentage shows the completeness statistic as a percentage. The completeness statistic ranges from 0 to 100, where 0 means the data is completely missing, and 100 indicates that the data is 100% complete.

Two-way View: Heat map, cross-table analysis

The user can also join the current table of another table using the Join with table options. By joining with another table, the user can use a column from that table as the Group-by column.\

FAQs

Question: Can I use the Two-way View to check how many female patients have sequencing data?

Answer: Yes. Assuming that your question involves 2 metadata: patient_sex (from the patient table) and sequencing_run_id (from the sequencing table). The patient and sequencing table are join-able by patient_id. If that is the case, you can open the patient table with the Two-way View; join it with the sequencing table; and choose patient_sex as the Group-by column. On the sequencing.sequencing_run_id, you can see the completeness rate broken down by each sex in patient_sex.

The heatmap options controller when doing cross-table analysis. We are joining "patients" table into the "observations" table

Completeness heatmap in case of cross-table analysis. In this example, the main table is "patients", the joined table is "observations". This heatmap shows how many patients who have available data (not-null values) on the fields which respect to the patient race: white, black, asian, native, or other

Column Profiles

Column Profiles mode shows each column as a tile. The chart type depends on the type of the column.

This screen provides detailed statistics and distribution charts for the columns in the table. For all column types, it displays the missing rate and the duplication rate.

For columns containing string data, it shows the number of unique values and the value frequency, which is represented in a distribution chart.

For columns containing float data, the screen provides information about the variance, standard deviation, and the value range frequency, which is displayed in a distribution chart. Additionally, a box plot is shown, illustrating the maximum value, Q3 (upper quartile), median, Q1 (lower quartile), and the minimum value.

For columns containing datetime data, the screen displays the variance, standard deviation, and value range frequency on a distribution chart. A box plot is also provided, showing the maximum value, Q3 (upper quartile), median, Q1 (lower quartile), and the minimum value.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Full Documentation
sales@dnanexus.com
https://synthea.mitre.org/downloads
https://doi.org/10.1016/j.ibmed.2020.100007