AlphaFold2
Necessary Disclaimers and Legal
Users are responsible for reviewing and complying with the license requirements of the software, notebooks, and data referenced in this documentation.
Users are responsible for compute and storage costs incurred within their DNAnexus project spaces.
Instance type availability and pricing are subject to the agreement between the user (or their organization) and DNAnexus.
Citations and Acknowledgments
This documentation references data and tools from the following resources:
For AlphaFold2 predictions, please cite the original AlphaFold publication and the nf-core/proteinfold pipeline (see CITATIONS.md).
Overview of post-folding analysis notebook
AlphaFold2, as implemented in the nf-core/proteinfold workflow, generates two primary outputs:
a predicted 3D structure
associated confidence metric
The PDB file contains atomic coordinates of the predicted model. Confidence information is encoded in the B-factor field as the predicted Local Distance Difference Test (pLDDT) score, which reflects residue-level confidence in the predicted local structure. Higher pLDDT values indicate greater confidence in local structural accuracy, while lower values often correspond to flexible or disordered regions. For additional background on AlphaFold2 outputs and confidence metrics, please refer to the DeepMind article Enabling high-accuracy protein structure prediction at the proteome scale, training material from EMBL-EBI, and the nf-core/proteinfold.
The Notebook is available on the Platform: alphafold2_plddt_p2rank_analysis-2026-04-08.ipynb . It is available here on AWS US East, AWS Europe (Frankfurt), AWS Europe (London), Azure Amsterdam, Azure US (West).
Workflow description:
This notebook evaluates the structural reliability of an AlphaFold2-predicted model and identifies confidence-supported binding regions. The workflow:
Extract residue-level pLDDT scores from the PDB file
Identify low-confidence regions (pLDDT < 50)
Predict potential binding pockets using P2Rank
Retain pockets enriched in high-confidence residues (e.g., pLDDT ≥ 70)
The final output provides a confidence-aware overview of predicted binding sites within the structure.
Running notebooks on the DNAnexus platform
Copying notebooks and snapshot into a Project
To use the notebooks, copy them into your project. Here are the steps to copy the notebooks into a project space:
Create a project for your analysis, billed to your own organization. Tutorials on how to set up a project can be found on this page.
Go to Resources Tab and find the project titled “Public Datasets AWS US (East)” and select the folder
"Post_folding_analysis" (post folding notebook)
“Notebook_snapshot”
Select notebooks and files in these three folders you want to copy. Please use snapshot: snapshot-molecular_modeling-jupyterlab-2026-04-08.tar.gz for environment setup
Select "Copy" on the top right menu, and select the project that you created in Step 1
Then, go to the project space you created in Step 1 to start exploring two notebooks.
To run the JupyterLab Notebooks, please see the JupyterLab section of the Academy Documentation.
Instance Type Selection
Instance wait times are subject to queue availability. Less common instance types may result in longer wait times due to their limited availability.
Instances started with snapshots may take longer to initialize due to environment setup.
Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.
The two notebooks are optimized for JupyterLab with Python, R, Stata, ML, Image Processing (version 2.11). If you do not have access, please contact the Success Team at [email protected] or the Sales Team at [email protected].
Recommended instance type for this demo: mem1_ssd1_v2_x16.
A note on notebooks
Use the snapshot when starting the job (e.g., snapshot-molecular_modeling-jupyterlab-2026-04-08.tar.gz). The snapshots can be found in the “Notebook_snapshot” folder under “Public Datasets AWS US (East)”.
Before running the notebooks, follow the instructions in the notebook markdown to select the correct kernel. If the required kernel is not available, activate the corresponding conda environment and register the kernel as described in the provided instructions.
The post-folding analysis notebook uses an example output file (T1024) generated by AlphaFold2 within the nf-core/proteinfold pipeline. These files are available in the Results folder for each region. They are available in the Public Datasets projects in each region.
If you would like to use this dataset in your own project, follow the section “Copying Notebooks and Snapshot into a Project”, and update the data path in the notebook accordingly. Alternatively, you may use the provided script to download the data directly from the Public Datasets AWS US (East) project.
Last updated