Example 2: fastq_quality_trimmer
In this exercise, we'll demonstrate a native DNAnexus Python applet that runs the fastq_quality_trimmer binary.
You will learn:
How to use a
DXFileobject to get file metadataHow to use Python functions to choose an output filename using the input file's name
How to add debugging output to your Python program
Getting Started
The inputs and outputs are the same as in the bash version of this applet. You can start from scratch using dx-app-wizard with the following input specs:
input_file
file
No
NA
quality_score
file
Yes
30
The output specs are as follows:
output_file
file
Or you can use the dxapp.json from the bash version and change the runSpec file to the name of your Python script and the interpreter to python3 as follows:
"runSpec": {
"timeoutPolicy": {
"*": {
"hours": 1
}
},
"interpreter": "python3",
"file": "src/python_fastq_trimmer.py",
"distribution": "Ubuntu",
"release": "20.04",
"version": "0"
},Inside your applet's source code, create resources/usr/local/bin and copy the fastq_quality_trimmer bin to this location. At runtime, the binary will be available at /usr/local/bin/fastq_quality_trimmer, which is in the standard $PATH.
Python Code
Update the Python code to the following:
The
input_filewill be the DNAnexus file ID (e.g.,file-FvQGZb00bvyQXzG3250XGbgz), and thequality_scorewill be an integer value.Use
DXFile.describeto get a Python dictionary of metadata.Choose a local filename by using either the file's
namefrom the metadata or the file ID.Download the input file to the chosen local filename.
Split the filename into a basename and extension.
Create an output filename using the input basename and a new extension to indicate that the data has been filtered.
Format a command string.
Print the command for debugging purposes.
Execute the command and check the return value.
If the code makes it to this point, upload the output file and return the file ID to be attached to the job's output.
Build and Run
Run dx build in your source directory to create the new applet. Use the new applet ID to execute the applet with a small FASTQ file:
Verify Ouput
Use dx head to verify the output looks like a FASTQ file:
To verify that the applet did winnow the number of reads, I can pipe the output of dx cat to wc to verify that the output file has fewer lines than the input file:
Review
You used
DXFileto get the input file's nameYour output filename is based on the input file's name rather than a static name like output.txt.
You can call Python's
printfunction to add your own STDOUT/STDERR to the applet, which can be an aid in debugging your program.
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?