Open Targets

The user is responsible for reviewing and complying with the license requirements of the software, notebooks, and data referenced in this documentation.

Users are responsible for the costs associated with analyzing the Open Targets dataset and its storage in their project spaces.

Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.

Citations for the Open Targets

The latest publication about Open Targets can be found in Open Targets Platform: facilitating therapeutic hypotheses building in drug discoveryarrow-up-right (2025) which describes recent updates to the Open Targets Platform. Users can also find more information about Open Targets in previous publications:

Open Targets is a public-private initiative led by the European Bioinformatics Institute (EBI) that comprehensively aggregates public data sources for drug discovery. The official version of dataset is hosted on Open Targets Platformarrow-up-right

Overview of the Open Targets Dataset

Open Targets is an integrated data resource that enables the systematic identification and prioritization of therapeutic targets. It combines diverse publicly available datasets with resources generated by the Open Targets consortium to compute and score target–disease associations, helping drive more informed decisions in early drug discovery. By aggregating evidence across genetics, molecular QTLs, somatic variation, expression, pathways, chemical biology, pharmacology, and literature, it provides comprehensive annotation of targets, diseases and drugs within a unified framework.

For dataset information can be found in the Official Open Target documentationarrow-up-right. The schema for each dataset can be viewed online in the Open Target Data Downloadarrow-up-right section.

On DNAnexus, we provide the complete Open Targets Platform release (version 25.09), including 38 datasets across seven major categories (target–disease associations, targets, ontology, genetics, diseases, drugs, and literature). For an overview of this release, users can refer to the official Release blogarrow-up-right and Release notearrow-up-right. See the “Where to Access Open Targets” section below to start accessing the dataset

Where to Access Open Targets

The following files are available for the Open Targets datasets:

  • All 38 datasets were downloaded directly from the official Open Targets FTP repository (version 25.09) and stored in Parquet. Users can use big data analytics tools including Spark to query and analyze. These files are found here on the platformarrow-up-right.

  • Notebook to showcase how to query and integrate datasets from Open Targets can be found here on the platformarrow-up-right. The file endings are .ipynb.

To use the dataset and notebooks, please copy the data and notebooks into your own project space. Details on how to copy the data are present under the section titled "Copying Data and Notebooks into a Project".

Running analyses on Open Targets

Copying Data and Notebooks into a Project

To utilize the dataset, please copy the data from this projectarrow-up-right into your own project.

Here are the steps to copy the Open Targets data into a Project Space:

  1. Create a project for your Open Targets dataset, billed to your own organization. Tutorials on how to set up a project can be found on this pagearrow-up-right.

  2. Go to Resources Tab and find the project titled “Public Datasets AWS US (East)” and select the folder "Open-Targetsarrow-up-right".

  3. Select the data folder and the notebooks

  4. Select "Copy" on the top right menu, and select the project that you created in Step 1.

  5. Then, go to the project space you created in Step 1 to start exploring the Open Targets dataset and notebooks.

  6. To run the JupyterLab Notebooks, please see the JupyterLabarrow-up-right section including a JupyterLab Notebookarrow-up-right and Running a Spark JupyterLab Notebookarrow-up-right of the Academy Documentation

Example notebook

We prepared an example of a notebook showing extracting colocalizations for GWAS credible sets associated with autoimmune diseases. The notebook is named as “autoimmune_colocalisations_spark.ipynbarrow-up-right” and is optimized for the JupyterLab with Spark Clusterarrow-up-right

  • Instance type: mem1_ssd1_v2_x16

  • Please follow the provided command-line instructions in the terminal that are found in the notebook example before running the notebook.

Last updated

Was this helpful?