# Column Level Screen

A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via <sales@dnanexus.com>).

**A Note on Data:**

The data used in this section of Academy documentation can be found here to download: <https://synthea.mitre.org/downloads>

The citation for this synthetic dataset is:

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. <https://doi.org/10.1016/j.ibmed.2020.100007>

## **String Column**

Column-level screen shows a string column

For columns containing string data, the data profiler will display several statistics and charts to help analyze the data.

The **statistics** include:

* The **missing rate**, expressed as a percentage of the missing values in the column.
* The **number of unique values** present in the column.

![](/files/G8ciYILB6fdt6Vf8uNlY)

The **charts** provided include:

* **Top Records Bar Chart**: This chart displays the top values that occur most frequently in the column. You can select how many top records to display using a dropdown list. By hovering over the bars, you can see the exact count of records for each value.
* **Character Length Distribution Chart**: This chart shows how the lengths of the strings are distributed. By hovering over different parts of the chart, you can view the range of character lengths and how frequently each range occurs. Besides, the **average** length of the strings in the column and **standard deviation** (which measures the amount of variation in the string lengths) are also reported.&#x20;
* **Boxplot**: The boxplot provides a visual summary of the data in terms of its distribution, showing the **maximum value**, **Q3 (upper quartile)**, **median**, **Q1 (lower quartile)**, and the **minimum value**.
* **Grouping Frequency Chart**: This chart displays how often unique values in the current column occur when grouped with values from another column. You can choose the column to group by using a dropdown list.

## **Float & Integer**

Column-level screen shows a float column

For columns containing float data, the data profiler provides several statistics and charts to help analyze the data.

The **statistics** include:

* The **missing rate**, displayed as a percentage of missing values.
* The **standard deviation**, which measures the spread of the data values.
* The **Interquartile range**, which measures the difference between the 75th and 25th percentiles of the data.

![](/files/eAEat7GvpQHQL0VPACIJ)

The **charts** provided include:

* **Distribution Chart**: This chart displays the distribution of values in the column. You can hover over the chart to view the range of values and their frequencies.
* **Boxplot**: The boxplot visually represents the distribution of the data, showing the **maximum value**, **Q3 (upper quartile)**, **median**, **Q1 (lower quartile)**, and the **minimum value**.
* **Grouping Frequency Chart (Two way plot)**: This chart shows the frequency of unique values in the current column, grouped with values from another column. You can select the column for grouping from a dropdown list.

## **Datetime**

Column-level screen shows a datetime column

For columns containing datetime data, the data profiler provides several statistics and charts for in-depth analysis.

The **statistics** include:

* The **missing rate**, displayed as a percentage of missing values.
* The **standard deviation**, measuring the dispersion of the datetime values.
* The **Mode**, showing the mode/format of the datetime data in the column.

![](/files/cHXLs46wF356UGF4K8yU)

The **charts** provided include:

* **Distribution Chart**: This chart shows the distribution of datetime values in the column. You can hover over the chart to view the range of values and their frequencies.
* **Boxplot**: The boxplot visually represents the distribution of the datetime data, displaying the **maximum value**, **Q3 (upper quartile)**, **median**, **Q1 (lower quartile)**, and the **minimum value**.
* **Radar Chart**: This chart displays the frequency of values, grouped by **year**, **month**, or **day**. You can change the grouping option using the dropdown at the top.
* **Grouping Frequency Chart (Two Way Plot)**: This chart shows the frequency of unique datetime values in the current column, grouped with values from another column. You can select the column for grouping from a dropdown list.

## **Pairwise plot between columns**

Even though each column type has a different layout on the Column-level Screen, **Pairwise plot between columns** is a common component.

The user can create a plot between the current column and any other column from the same table. However, not all columns are available for this feature. Data Profiler will show columns that satisfy the following conditions:

* Not a string column
* If it is a string column:
  * Not a primary key
  * The number of unique values count is no larger than 30

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.dnanexus.com/mlaccelerator/dataprofiler/columnlevelscreen.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
