Open Targets

Necessary Disclaimers and Legal

The user is responsible for reviewing and complying with the license requirements of the software, notebooks, and data referenced in this documentation.

Users are responsible for the costs associated with analyzing the Open Targets dataset and its storage in their project spaces.

Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.

Citations for the Open Targets

The latest publication about Open Targets can be found in Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery (2025) which describes recent updates to the Open Targets Platform. Users can also find more information about Open Targets in previous publications:

Open Targets is a public-private initiative led by the European Bioinformatics Institute (EBI) that comprehensively aggregates public data sources for drug discovery. The official version of dataset is hosted on Open Targets Platform

Overview of the Open Targets Dataset

Open Targets is an integrated data resource that enables the systematic identification and prioritization of therapeutic targets. It combines diverse publicly available datasets with resources generated by the Open Targets consortium to compute and score target–disease associations, helping drive more informed decisions in early drug discovery. By aggregating evidence across genetics, molecular QTLs, somatic variation, expression, pathways, chemical biology, pharmacology, and literature, it provides comprehensive annotation of targets, diseases and drugs within a unified framework.

For dataset information can be found in the Official Open Target documentation. The schema for each dataset can be viewed online in the Open Target Data Download section.

On DNAnexus, we provide the complete Open Targets Platform release (version 25.09), including 38 datasets across seven major categories (target–disease associations, targets, ontology, genetics, diseases, drugs, and literature). For an overview of this release, users can refer to the official Release blog and Release note. See the “Where to Access Open Targets” section below to start accessing the dataset

Where to Access Open Targets

The following files are available for the Open Targets datasets:

All 38 datasets were downloaded directly from the official Open Targets FTP repository (version 25.09) and stored in Parquet. Users can use big data analytics tools including Spark to query and analyze. These files are found here on the platform.
Notebook to showcase how to query and integrate datasets from Open Targets can be found here on the platform. The file endings are .ipynb.

To use the dataset and notebooks, please copy the data and notebooks into your own project space. Details on how to copy the data are present under the section titled "Copying Data and Notebooks into a Project".

Running analyses on Open Targets

Copying Data and Notebooks into a Project

To utilize the dataset, please copy the data from this project into your own project.

Here are the steps to copy the Open Targets data into a Project Space:

Create a project for your Open Targets dataset, billed to your own organization. Tutorials on how to set up a project can be found on this page.
Go to Resources Tab and find the project titled “Public Datasets AWS US (East)” and select the folder "Open-Targets".
Select the data folder and the notebooks
Select "Copy" on the top right menu, and select the project that you created in Step 1.
Then, go to the project space you created in Step 1 to start exploring the Open Targets dataset and notebooks.
To run the JupyterLab Notebooks, please see the JupyterLab section including a JupyterLab Notebook and Running a Spark JupyterLab Notebook of the Academy Documentation

Example notebook

We prepared an example of a notebook showing extracting colocalizations for GWAS credible sets associated with autoimmune diseases. The notebook is named as “autoimmune_colocalisations_spark.ipynb” and is optimized for the JupyterLab with Spark Cluster

Instance type: mem1_ssd1_v2_x16
Please follow the provided command-line instructions in the terminal that are found in the notebook example before running the notebook.

Video: Utilizing the Open Targets Dataset on the DNAnexus Platform

Previousscvi-tools and CZ CELLxGENE Nextnf-core: Proteinfold

Last updated 1 month ago

Was this helpful?

hashtagNecessary Disclaimers and Legal

hashtagCitations for the Open Targets

hashtagOverview of the Open Targets Dataset

hashtagWhere to Access Open Targets

hashtagRunning analyses on Open Targets

hashtagCopying Data and Notebooks into a Project

hashtagExample notebook

hashtagVideo: Utilizing the Open Targets Dataset on the DNAnexus Platform