# Dataset Level Screen

*A license is required to access the Data Profiler on the DNAnexus Platform. For more information, please contact DNAnexus Sales (via* [*sales@dnanexus.com*](mailto:sales@dnanexus.com)*).*

### A Note on Data:&#x20;

The data used in this section of Academy documentation can be found here to download: <https://synthea.mitre.org/downloads>

The citation for this synthetic dataset is:&#x20;

Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. <https://doi.org/10.1016/j.ibmed.2020.100007>

### Dataset Level Screen

Dataset-level screen is the default screen when you open Data Profiler. It has the Table Relationship and Table Summary pages. In this section, we describe each component of the screen and its key values.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfwzBWsh7AXJZDTQsR2CII1P2DlkXHfke-Pn78OwBLU09BtmlgulXzaY_fXUvTCr9amgqVFr57TEJhXlHR-SeefAa61B4NDYrR0Ff2_czgYIg-t-qlhpRaOQdsqtSPEmbJcv24Wkj5pwp3OfvEkhAQ?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

The default screen of Data Profiler is at the Table Relationships page of the Dataset level

### Manage Tables

The Manage Tables controller allows you to hide/show the table(s) from the data profile. The table(s) which are hidden from the ERD will also be hidden from the Data Hierarchy. In order to manage the table display, click on the ‘Manage’ button on the bottom right corner of the screen, then use the toggle to hide/show the tables, and click on the ‘Apply’ button to apply the changes.&#x20;

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcYf0Tq-GPQs81XfaUnBzJEkZ6_Gy-BdU93kbmYJqIcIo1cOqycnVB6eGmjdiCkVd6DzawiPb7AcPjP5zLx1odqF4y0UxqFHHH5IxnArh3DLT9Wtphn5TMwq9b6_7p3oFOtX9iPElBag4hBOcQIwkY?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

Open the ‘Manage Tables’ controller to show/hide the table(s)

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeRvCpu1M4YYIN6S3VLb7_k4wjtU5qzXvh2466X4HwJzxUvZqY2RyfOjy5AqQkoyLdBYpx2y-v0UMPEFYEECiAoH-ahORFzPeBByFpn99hbFH0LlHbBGTTHhKr4bAiTf25jXOVf4hi3t2haa9mWso4?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

The data profile is updated after the ‘patients’ table is hidden

### Table Relationships

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXenrK4QWywXthZ0spa0C8Q3Yyn_2XA1Zzdc-xGckxIqivqw10NGjOzi6C8suBPlEt3mONzG4lSGFem6zDlFy887RqqpC9RqlZJL93ctnSm3KOwa4yxaP4SImzr0tMlzNE5v6nDB3BbJxy16pCyX09Y?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

A Relationship Diagram (left) with some selected edges highlighted in blue. The selected edges create a Diagram of Overlaps (right)

This is a simplified Entity Relationship Diagram displayed as a graph. Each node represents a table in your dataset, and each edge represents a column that links two tables. The linked columns are the referenced\_entity\_field in the data\_dictionary. The direction of an edge represents the reference from a foreign-key column to a primary-key column

| FAQs                                                                                                                                                                                                                                                                                                                                                                          |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p>Question: There are tables supposed to be linked to each other. Why do they appear unlinked in Data Profiler?</p><p>Answer: The linkage between any two tables are determined by the data\_dictionary. Data Profiler does not remove or add linkages to a dataset. You should check your data\_dictionary again and make sure that the linkage is correctly specified.</p> |

By clicking on one or more edges, you can view a Diagram of Overlaps that shows how many values the linked columns share between the tables. There are several chart types for a Diagram of Overlaps:

#### Venn Diagram

Venn diagram is the default chart type of Diagram of Overlaps. Each set in this diagram is a table in the selection. The numbers are the values from the column in the selection.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdsi7_Iorjbvz3uG85TcwxVdoo9XS3djOuu-Qyv_QYWuPEMNWEI8HiO-riEmKKi0Q2kyJzlw5JQJitSlHUfNkiURnWNkr7-PUov87fvYBEt0PQE1pAvqm7dUQVylsXOFvMRgvtOScKb-6RchxKHo1M?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

Question: How should I interpret a Venn diagram having 2 tables, patients and measurements, and the value of their intersection is 90? The column is patient\_id.

Answer: When patients and measurement tables share some patient\_ids, It basically means there are 90 patients having measurements data.

#### Euler Diagram

Euler diagrams share the same concept with Venn diagrams. The only difference is the size of overlap sections are proportional to the overlap value.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcZKmYymWfPDMibLCgLWhWtIEIek9fJUQtkMJ-v4uUw674a6IfyPMRAGGM5NJhjgGeUJq6JxyOcBJaACQYLNCNwAXXm2tBjTPLKToF03t3bqyDApudEhdn_hwr0W_nyVxztaXDL2roQBcTJOYjxpw?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

#### Upset Plot

Upset plot counts the value of all non-empty possible combinations from the selected tables. This plot type is more scalable than the Venn or Euler diagram.

A common use case of Upset plot is to help answer questions such as “How many patients have full information across tables?”. By creating an Upset plot between the “patients” table and other tables (e.g. diagnosis, measurement, sequence\_run, etc.), we can answer the questions by looking at the number of patient ids that are shared across all tables.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXef4MV4tx9TsmI4bb6fWz7K2jsa5e7fDgk657_7ZTXIawVWOWLUv-DM43HlmsWXctyF-YZ3Qjw2J7_Ugn7HJ8ex6y378szX-U8auZRu2mFR3-sqsS_qt2A80cuLeutBTZw3LdombCQwnxNsgFCSxv8?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

### Summary Page

The Summary page provides summary for both tables and columns in the Dataset. Below are the details of each section.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfP1s4iAfmR9BBekeWGgjAa8gKZvlpGtHOjL-Wg1WiRyyWE3k2GyD4Bh7FCjoJ1rw8ivN0fvkD5kuFNo5xNG2MGkZFBVBUWeURE8T7f20UUBVo2SosOGnJLylFElIl7lKenvgKMJl0SGRMmLWlkL5Y?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

The summary of all Tables and Columns in the Dataset

#### Table Summary

The Table Summary shows information about all tables in the dataset. Each row displays various statistics for a table in your dataset, including:

* \# Columns, # Rows: the number of columns, the number of rows
* Column types: data type of all columns in a table
* Duplication Rate: the rate of duplication of a whole row in the table
* Missing Rate: the rate of having an empty cell in the table

You can click on the hamburger button at the header of each column to sort or filter the data as needed.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcj40Wqmsvl0xQ5VolKYGdog_HpNvKH86Lb7vzXUSlwbMCo8HHqNIzOC0YpKHspji4HjI5cKhbNo0k7tul_VIXxTS6gGSL0Jziryv79ke3UX3ImwgJbYCGXYvh0jvqYk1aD5ZfBv7avANEMcjb5UQ?key=a0OQZIDTsGvq8nPWvgEF9iVc" alt=""><figcaption></figcaption></figure>

Clicking on the hamburger button to sort or filter the data

#### Column Summary

The Column Summary provides details about every column in the dataset, with each row presenting below information for a specific column.

* Column name: name of the column
* Key type: the attributes that are used to define the relationships of tables
* Description: the title of a column (if provided in the data dictionary file)
* Provided type: the type of data in the column which is specified in the data dictionary file. If the data dictionary is not provided, it is ‘unknown’
* Inferred types: the type of data in the column inferred by Data Profiler if the data dictionary is not provided. If the data dictionary is provided, it will be the same as the Provided type
* Missing Rate: the rate of having an empty cell in a column
* Duplication Rate: the rate of duplication of values in a column

You can also click on the hamburger button at the header of each column to sort or filter the data as needed.

### Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select “Contact Support”
3. Fill in the Subject and Message to submit a support ticket.
