Utilizing Data Profiler Navigator
Last updated
Was this helpful?
Last updated
Was this helpful?
A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via ).
The data used in this section of Academy documentation can be found here to download:
The citation for this synthetic dataset is:
Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007.
Data Profiler helps the user explore different levels of a dataset. There are 3 levels of a dataset in Data Profiler:
Dataset level: Show relationships between tables in the dataset and overview of all tables, columns in the dataset
Table level: Show statistics of one particular table. It can also join with another table to create a joint profile.
Column level: Show statistics of one particular column of a table. It can also combine with other columns in the same table to create a joint profile.
To navigate between these 3 levels, the user can select from a navigator on the left side of the application. Once an option of the navigator is selected, the content of the main interface will change accordingly.
The user interface of Data Profiler consists of a navigator (left, highlighted in blue), which controls the content of the main section (right, highlighted in green).
Navigator controls the content on the main section of Data Profiler. The main component of the Navigator is a hierarchical structure of the dataset, called Data Hierarchy
The top level of a Data Hierarchy is All Tables, indicating the dataset level. This level is selected by default.
Under All Tables are individual tables in the dataset. Each table has a number on the far right indicating the number of columns in the table.
Once a table is selected, the Data Hierarchy will show all columns from that table. Each column has a colored tag indicating the column type.
Above the Data Hierarchy, the user can search for one or more columns. The Data Hierarchy will show tables that have at least one of the column names in the search list (OR logic).
At the bottom of the Navigator, the user can switch to an Explorer Mode to create charts on their own. The functionality of this mode is discussed in another section of this document.
The 📜 button shows the Inference Logs Screen that show details on the profiling process. This feature is in development.
The type of a column in Data Profiler can be specified in a data_dictionary. If that information is not available, Data Profiler will infer the column type based on the content of the column.
Column type
Description
Example
string
A string column has free-text values. This is the default fallback type when Data Profiler fails to cast a column type.
Patient’s name; Patient’s ID
integer
An integer column has integer values.
Number of children
float
A float column has float values.
Weight; Height
datetime
A float column has float values. The default time zone is UTC.
Date of birth
unknown
The column is empty
Null (or empty) values are allowed in all column types and they do not affect how a column type is determined.
In my data_dictionary, the type of column A is “integer”. After loading with Data Profiler, the application says column A is a “string” column. What happened?
There is at least one non-null arbitrary value in column A that cannot be cast to an integer. Therefore, the Data Profiler falls back to “string”.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.
In Data Profiler, there are 4 column types. These types are consistent with the data types used in the via Data Model Loader on DNAnexus platform: