In this chapter, you'll learn to create an applet that uses the executable from the FASTX-Toolkit collection of command-line tools for processing short-read FASTA and FASTQ files. You'll use the applet to run FastQTrimmer on a FASTQ file, creating a trimmed reads file that you can then use for further analysis.
You will learn the following:
How to accept an optional integer argument from the user
How to add resource files to an applet such as a binary executable that can be used in your applet code
Starting the Applet
Run dx-app-wizard mytrimmer to create the mytrimmer applet. You have already added the app name, so you can press enter when prompted. You can add a title and summary if you would like, as well as version.
Start the input specification with the input FASTQ:
Input Specification
You will now be prompted for each input parameter to your app. Each parameter
should have a unique name that uses only the underscore "_" and alphanumeric
characters, and does not start with a number.
1st input name (<ENTER> to finish): input_file
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet array:file array:record file int
array:applet array:float array:string float record
array:boolean array:int boolean hash string
Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n
Next, indicate an optional integer for the quality score:
2nd input name (<ENTER> to finish): quality_score
Label (optional human-readable name) []: Quality score
Choose a class (<TAB> twice for choices): int
This is an optional parameter [y/n]: y
A default value should be provided [y/n]: y
Default value: 30
Press Enter to skip a third input and move to the output specification, which should define a single output file:
Output Specification
You will now be prompted for each output parameter of your app. Each
parameter should have a unique name that uses only the underscore "_" and
alphanumeric characters, and does not start with a number.
1st output name (<ENTER> to finish): output_file
Label (optional human-readable name) []: Output file
Choose a class (<TAB> twice for choices): file
Press enter to exit the output section.
Set a timeout policy if you would like.
Answer the remaining questions to create a bash applet. The applet does not need access to the internet or parent project, and you can choose the default instance type.
Open the mytrimmer/dxapp.json in a text editor to view the inputSpec:
To make input file selection more convenient for the user, edit the patterns for the file extensions of the input_file to be those commonly used for FASTQ files:
These patterns are used in the web interface to filter files for the user, but it's not a requirement that the input files match these patterns. The file filter can be turned off by the user, so these patterns are merely suggestions.
Adding a Binary Resource
Next, you will add a binary executable file from the FASTX toolkit. Download and unpack the FASTX toolkit binaries:
x ./bin/fasta_clipping_histogram.pl
x ./bin/fasta_formatter
x ./bin/fasta_nucleotide_changer
x ./bin/fastq_masker
x ./bin/fastq_quality_boxplot_graph.sh
x ./bin/fastq_quality_converter
x ./bin/fastq_quality_filter
x ./bin/fastq_quality_trimmer
x ./bin/fastq_to_fasta
x ./bin/fastx_artifacts_filter
x ./bin/fastx_barcode_splitter.pl
x ./bin/fastx_clipper
x ./bin/fastx_collapser
x ./bin/fastx_nucleotide_distribution_graph.sh
x ./bin/fastx_nucleotide_distribution_line_graph.sh
x ./bin/fastx_quality_stats
x ./bin/fastx_renamer
x ./bin/fastx_reverse_complement
x ./bin/fastx_trimmer
x ./bin/fastx_uncollapser
Then make the executable with the make file. This will create your executable.
The files are also here to download and for you to unpack:
Create the directory resources/usr/bin inside the mytrimmer directory:
mkdir -p mytrimmer/resources/usr/bin/
When the app is bundled, the directory structure in the resources directory will be compressed and unpacked as is on the instance, so you should create a directory that is in the standard $PATH such as /usr/bin or /usr/local/bin.
This applet only requires the fastq_quality_trimmer binary, so copy it to the preceding directory:
The variables $input_file and $input_file_name are based on the inputSpec name input_file. The first is a record-like string {"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"}, while the latter is the filename small-celegans-sample.fastq.
Run fastq_quality_trimmer using the given $quality_score and write to the output filename. The -Q option is an undocumented option to indicate that the scores are in phred 33.
Upload the output file, which returns another record-like string describing the newly created file.
Add the newly uploaded record as a file output of the job.
You don't need to indicate the full path to fastq_quality_trimmer because it will exist in the directory /usr/local/bin, which is in the standard $PATH.
Creating a Project for the Data and Applet
Add the sample FASTQ file to the project either by using the URL importer as shown in Figure 6, or download the file to your computer and upload through the web interface or using dx upload:
[===========================================================>]
Uploaded 16,801,690 of 16,801,690 bytes (100%) small-celegans-sample.fastq
ID file-GJ2k2V80vx88z3zyJbVXZj3G
Class file
Project project-GJ2k24j0vx804FPyBbxqpQBk
Folder /
Name small-celegans-sample.fastq
State closing
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created Tue Oct 11 08:52:37 2022
Created by kyclark
Last modified Tue Oct 11 08:52:53 2022
Media type
archivalState "live"
cloudAccount "cloudaccount-dnanexus"
You can select the output file and view the results.
You can download the output file and check that the filtering actually removed some of the input sequences by using wc to count the original file and the result:
The variable $input_file_prefix is the name of the input file without the file extension, so small-celegans-sample, which is used to create the output filename small-celegans-sample.filtered.fastq. See .