Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Getting Started
  • Python Code
  • Build and Run
  • Verify Ouput
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Building Applets
  2. Python

Example 2: fastq_quality_trimmer

In this exercise, we'll demonstrate a native DNAnexus Python applet that runs the fastq_quality_trimmer binary.

You will learn:

  • How to use a DXFile object to get file metadata

  • How to use Python functions to choose an output filename using the input file's name

  • How to add debugging output to your Python program

Getting Started

The inputs and outputs are the same as in the bash version of this applet. You can start from scratch using dx-app-wizard with the following input specs:

Input Name
Type
Optional
Default Value

input_file

file

No

NA

quality_score

file

Yes

30

The output specs are as follows:

Output Name
Type

output_file

file

Or you can use the dxapp.json from the bash version and change the runSpec file to the name of your Python script and the interpreter to python3 as follows:

    "runSpec": {
        "timeoutPolicy": {
            "*": {
                "hours": 1
            }
        },
        "interpreter": "python3",
        "file": "src/python_fastq_trimmer.py",
        "distribution": "Ubuntu",
        "release": "20.04",
        "version": "0"
    },

Inside your applet's source code, create resources/usr/local/bin and copy the fastq_quality_trimmer bin to this location. At runtime, the binary will be available at /usr/local/bin/fastq_quality_trimmer, which is in the standard $PATH.

Python Code

Update the Python code to the following:

python_fastq_trimmer.py
#!/usr/bin/env python3

import dxpy
import os
import sys
from subprocess import getstatusoutput


@dxpy.entry_point("main")
def main(input_file, quality_score): # 1
    input_file = dxpy.DXFile(input_file)
    desc = input_file.describe() # 2
    local_file = desc.get("name", input_file.get_id()) # 3
    dxpy.download_dxfile(input_file.get_id(), local_file)  # 4

    basename, ext = os.path.splitext(local_file) # 5
    outfile = f"{basename}.filtered{ext}" # 6
    cmd = ( # 7
        f"fastq_quality_trimmer -Q 33 -t {quality_score} "
        f"-i {local_file} -o {outfile}"
    )
    print(cmd) # 8
    rv, out = getstatusoutput(cmd) # 9

    if rv != 0:
        sys.exit(out)

    dx_output_file = dxpy.upload_local_file(outfile) # 10
    return {"output_file": dxpy.dxlink(dx_output_file)}


dxpy.run()
  1. The input_file will be the DNAnexus file ID (e.g., file-FvQGZb00bvyQXzG3250XGbgz), and the quality_score will be an integer value.

  2. Choose a local filename by using either the file's name from the metadata or the file ID.

  3. Download the input file to the chosen local filename.

  4. Split the filename into a basename and extension.

  5. Create an output filename using the input basename and a new extension to indicate that the data has been filtered.

  6. Format a command string.

  7. Print the command for debugging purposes.

  8. Execute the command and check the return value.

  9. If the code makes it to this point, upload the output file and return the file ID to be attached to the job's output.

Build and Run

Run dx build in your source directory to create the new applet. Use the new applet ID to execute the applet with a small FASTQ file:

$ dx run applet-GgKQ5qQ071x5yX7fgbq3PkXB \
> -f python_fastq_trimmer/job_input.json -y --watch \
> --destination project-GXY0PK0071xJpG156BFyXpJF:/output/python_fastq_trimmer/

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-FvQGZb00bvyQXzG3250XGbgz"
    },
    "quality_score": 28
}

Calling applet-GgKQ5qQ071x5yX7fgbq3PkXB with output destination
  project-GXY0PK0071xJpG156BFyXpJF:/output/python_fastq_trimmer

Job ID: job-GgKQ6x0071x6kf34P5xy2q2b

Job Log
-------
Watching job job-GgKQ6x0071x6kf34P5xy2q2b. Press Ctrl+C to stop watching.
* Python version of fastq_trimmer (python_fastq_trimmer:main) (running)
* job-GgKQ6x0071x6kf34P5xy2q2b
  kyclark 2024-02-26 14:32:36 (running for 0:00:21)
2024-02-26 14:33:17 Python version of fastq_trimmer INFO Logging initialized
(priority)
2024-02-26 14:33:17 Python version of fastq_trimmer INFO Logging initialized
(bulk)
2024-02-26 14:33:21 Python version of fastq_trimmer INFO Downloading bundled
file resources.tar.gz
2024-02-26 14:33:22 Python version of fastq_trimmer STDOUT >>> Unpacking
resources.tar.gz to /
2024-02-26 14:33:22 Python version of fastq_trimmer STDERR tar: Removing
leading `/' from member names
2024-02-26 14:33:22 Python version of fastq_trimmer INFO Setting SSH public key
2024-02-26 14:33:23 Python version of fastq_trimmer STDOUT dxpy/0.369.0
(Linux-5.15.0-1053-aws-x86_64-with-glibc2.29) Python/3.8.10
2024-02-26 14:33:23 Python version of fastq_trimmer STDOUT Invoking main with
{'input_file': {'$dnanexus_link': 'file-FvQGZb00bvyQXzG3250XGbgz'},
'quality_score': 28}
2024-02-26 14:33:24 Python version of fastq_trimmer STDOUT
fastq_quality_trimmer -Q 33 -t 28 -i small-celegans-sample.fastq -o
small-celegans-sample.filtered.fastq
* Python version of fastq_trimmer (python_fastq_trimmer:main) (done)
* job-GgKQ6x0071x6kf34P5xy2q2b
  kyclark 2024-02-26 14:32:36 (runtime 0:00:20)
  Output: output_file = file-GgKQ79j0B2FQjGbk0qX6j64B

Verify Ouput

Use dx head to verify the output looks like a FASTQ file:

$ dx head file-GgKQ79j0B2FQjGbk0qX6j64B
@SRR070372.1 FV5358E02GLGSF length=78
TTTTTTTTTTTTTTTTTTTTTTTTTTTNTTTNTTTNTTTNTTTATTTATTTATTTATTATTATATATATATA
+SRR070372.1 FV5358E02GLGSF length=78
...000//////999999<<<=<<666!602!777!922!688:669A9=<=122569AAA?>@BBBBAA?=
@SRR070372.2 FV5358E02FQJUJ length=177
TTTCTTGTAATTTGTTGGAATACGAGAACATCGTCAATAATATATCGTATGAATTGAACCACACGGCACATATTTGAACTTGTTCGTGAAATTTAGCGAACCTGGCAGGACTCGAACCTCCAATCTTCGGATCCGAAGTCCGACGCCCCCGCGTCGGATGCGTTGTTACCACTGCTT
+SRR070372.2 FV5358E02FQJUJ length=177
222@99912088>C<?7779@<GIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC;6666IIIIIIIIIIII;;;HHIIE>944=>=;22499;CIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIH?;;;?IIEEEEEEEEIIII77777I7EEIIEEHHHHHIIIIIIIIIIIIII
@SRR070372.3 FV5358E02GYL4S length=70
TTGGTATCATTGATATTCATTCTGGAGAACGATGGAACATACAAGAATTGTGTTAAGACCTGCAT

To verify that the applet did winnow the number of reads, I can pipe the output of dx cat to wc to verify that the output file has fewer lines than the input file:

$ dx cat file-GgKQ79j0B2FQjGbk0qX6j64B | wc -l
   99952

$ dx cat file-FvQGZb00bvyQXzG3250XGbgz | wc -l
  100000

Review

  • You used DXFile to get the input file's name

  • Your output filename is based on the input file's name rather than a static name like output.txt.

  • You can call Python's print function to add your own STDOUT/STDERR to the applet, which can be an aid in debugging your program.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousExample 1: Word Count (wc)NextExample 3: cnvkit

Last updated 9 months ago

Was this helpful?

Use to get a Python dictionary of metadata.

DXFile.describe
Full Documentation