Utilizing Data Profiler Navigator

A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via [email protected]).

A Note on Data:

The data used in this section of Academy documentation can be found here to download: https://synthea.mitre.org/downloads

The citation for this synthetic dataset is:

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. https://doi.org/10.1016/j.ibmed.2020.100007

How to Navigate in Data Profiler

Data Profiler helps the user explore different levels of a dataset. There are 3 levels of a dataset in Data Profiler:

Dataset level: Show relationships between tables in the dataset and overview of all tables, columns in the dataset
Table level: Show statistics of one particular table. It can also join with another table to create a joint profile.
Column level: Show statistics of one particular column of a table. It can also combine with other columns in the same table to create a joint profile.

To navigate between these 3 levels, the user can select from a navigator on the left side of the application. Once an option of the navigator is selected, the content of the main interface will change accordingly.

The user interface of Data Profiler consists of a navigator (left, highlighted in blue), which controls the content of the main section (right, highlighted in green).

Navigator

Navigator controls the content on the main section of Data Profiler. The main component of the Navigator is a hierarchical structure of the dataset, called Data Hierarchy

All Tables

The top level of a Data Hierarchy is All Tables, indicating the dataset level. This level is selected by default.

Under All Tables are individual tables in the dataset. Each table has a number on the far right indicating the number of columns in the table.

Data Hierarchy

Once a table is selected, the Data Hierarchy will show all columns from that table. Each column has a colored tag indicating the column type.

Searching for Columns

Above the Data Hierarchy, the user can search for one or more columns. The Data Hierarchy will show tables that have at least one of the column names in the search list (OR logic).

Explorer Mode

At the bottom of the Navigator, the user can switch to an Explorer Mode to create charts on their own. The functionality of this mode is discussed in another section of this document.

The 📜 button shows the Inference Logs Screen that show details on the profiling process. This feature is in development.

Column Types

The type of a column in Data Profiler can be specified in a data_dictionary. If that information is not available, Data Profiler will infer the column type based on the content of the column.

In Data Profiler, there are 4 column types. These types are consistent with the data types used in the data ingestion step via Data Model Loader on DNAnexus platform:

Column type

Description

Example

string

A string column has free-text values. This is the default fallback type when Data Profiler fails to cast a column type.

Patient’s name; Patient’s ID

integer

An integer column has integer values.

Number of children

float

A float column has float values.

Weight; Height

datetime

A float column has float values. The default time zone is UTC.

Date of birth

unknown

The column is empty

Null (or empty) values are allowed in all column types and they do not affect how a column type is determined.

FAQs about Columns

In my data_dictionary, the type of column A is “integer”. After loading with Data Profiler, the application says column A is a “string” column. What happened?
There is at least one non-null arbitrary value in column A that cannot be cast to an integer. Therefore, the Data Profiler falls back to “string”.

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.

PreviousIntroduction to Data Profiler NextDataset Level Screen

Last updated 5 months ago

Was this helpful?