Column Level Screen

A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via [email protected]).

A Note on Data:

The data used in this section of Academy documentation can be found here to download: https://synthea.mitre.org/downloads

The citation for this synthetic dataset is:

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. https://doi.org/10.1016/j.ibmed.2020.100007

String Column

Column-level screen shows a string column

For columns containing string data, the data profiler will display several statistics and charts to help analyze the data.

The statistics include:

The missing rate, expressed as a percentage of the missing values in the column.
The number of unique values present in the column.

The charts provided include:

Top Records Bar Chart: This chart displays the top values that occur most frequently in the column. You can select how many top records to display using a dropdown list. By hovering over the bars, you can see the exact count of records for each value.
Character Length Distribution Chart: This chart shows how the lengths of the strings are distributed. By hovering over different parts of the chart, you can view the range of character lengths and how frequently each range occurs. Besides, the average length of the strings in the column and standard deviation (which measures the amount of variation in the string lengths) are also reported.
Boxplot: The boxplot provides a visual summary of the data in terms of its distribution, showing the maximum value, Q3 (upper quartile), median, Q1 (lower quartile), and the minimum value.
Grouping Frequency Chart: This chart displays how often unique values in the current column occur when grouped with values from another column. You can choose the column to group by using a dropdown list.

Float & Integer

Column-level screen shows a float column

For columns containing float data, the data profiler provides several statistics and charts to help analyze the data.

The statistics include:

The missing rate, displayed as a percentage of missing values.
The standard deviation, which measures the spread of the data values.
The Interquartile range, which measures the difference between the 75th and 25th percentiles of the data.

The charts provided include:

Distribution Chart: This chart displays the distribution of values in the column. You can hover over the chart to view the range of values and their frequencies.
Boxplot: The boxplot visually represents the distribution of the data, showing the maximum value, Q3 (upper quartile), median, Q1 (lower quartile), and the minimum value.
Grouping Frequency Chart (Two way plot): This chart shows the frequency of unique values in the current column, grouped with values from another column. You can select the column for grouping from a dropdown list.

Datetime

Column-level screen shows a datetime column

For columns containing datetime data, the data profiler provides several statistics and charts for in-depth analysis.

The statistics include:

The missing rate, displayed as a percentage of missing values.
The standard deviation, measuring the dispersion of the datetime values.
The Mode, showing the mode/format of the datetime data in the column.

The charts provided include:

Distribution Chart: This chart shows the distribution of datetime values in the column. You can hover over the chart to view the range of values and their frequencies.
Boxplot: The boxplot visually represents the distribution of the datetime data, displaying the maximum value, Q3 (upper quartile), median, Q1 (lower quartile), and the minimum value.
Radar Chart: This chart displays the frequency of values, grouped by year, month, or day. You can change the grouping option using the dropdown at the top.
Grouping Frequency Chart (Two Way Plot): This chart shows the frequency of unique datetime values in the current column, grouped with values from another column. You can select the column for grouping from a dropdown list.

Pairwise plot between columns

Even though each column type has a different layout on the Column-level Screen, Pairwise plot between columns is a common component.

The user can create a plot between the current column and any other column from the same table. However, not all columns are available for this feature. Data Profiler will show columns that satisfy the following conditions:

Not a string column
If it is a string column:
- Not a primary key
- The number of unique values count is no larger than 30

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

PreviousTable Level Screen NextExplorer Mode

Last updated 8 months ago

Was this helpful?