Example 2: fastq_quality_trimmer

In this chapter, you'll learn to create an applet that uses the executable from the FASTX-Toolkit collection of command-line tools for processing short-read FASTA and FASTQ files. You'll use the applet to run FastQTrimmer on a FASTQ file, creating a trimmed reads file that you can then use for further analysis.

You will learn the following:

  • How to accept an optional integer argument from the user

  • How to add resource files to an applet such as a binary executable that can be used in your applet code

Starting the Applet

Run dx-app-wizard mytrimmer to create the mytrimmer applet. You have already added the app name, so you can press enter when prompted. You can add a title and summary if you would like, as well as version.

Start the input specification with the input FASTQ:

Input Specification

You will now be prompted for each input parameter to your app.  Each parameter
should have a unique name that uses only the underscore "_" and alphanumeric
characters, and does not start with a number.

1st input name (<ENTER> to finish): input_file
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

Next, indicate an optional integer for the quality score:

Press Enter to skip a third input and move to the output specification, which should define a single output file:

Press enter to exit the output section.

Set a timeout policy if you would like.

Answer the remaining questions to create a bash applet. The applet does not need access to the internet or parent project, and you can choose the default instance type.

Open the mytrimmer/dxapp.json in a text editor to view the inputSpec:

To make input file selection more convenient for the user, edit the patterns for the file extensions of the input_file to be those commonly used for FASTQ files:

These patterns are used in the web interface to filter files for the user, but it's not a requirement that the input files match these patterns. The file filter can be turned off by the user, so these patterns are merely suggestions.

Adding a Binary Resource

Next, you will add a binary executable file from the FASTX toolkit. Download and unpack the FASTX toolkit binaries:

Then make the executable with the make file. This will create your executable.

The files are also here to download and for you to unpack:

5MB
archive
Open

Create the directory resources/usr/bin inside the mytrimmer directory:

When the app is bundled, the directory structure in the resources directory will be compressed and unpacked as is on the instance, so you should create a directory that is in the standard $PATH such as /usr/bin or /usr/local/bin.

This applet only requires the fastq_quality_trimmer binary, so copy it to the preceding directory:

You should remove the downloaded binary artefacts as they are no longer needed.

Writing the Applet

Update mytrimmer/src/mytrimmer.sh with the following code:

  • The variables $input_file and $input_file_name are based on the inputSpec name input_file. The first is a record-like string {"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"}, while the latter is the filename small-celegans-sample.fastq.

  • The variable $input_file_prefix is the name of the input file without the file extension, so small-celegans-sample, which is used to create the output filename small-celegans-sample.filtered.fastq. See the documentation.

  • Run fastq_quality_trimmer using the given $quality_score and write to the output filename. The -Q option is an undocumented option to indicate that the scores are in phred 33.

  • Upload the output file, which returns another record-like string describing the newly created file.

  • Add the newly uploaded record as a file output of the job.

You don't need to indicate the full path to fastq_quality_trimmer because it will exist in the directory /usr/local/bin, which is in the standard $PATH.

Creating a Project for the Data and Applet

Add the sample FASTQ file to the project either by using the URL importer as shown in Figure 6, or download the file to your computer and upload through the web interface or using dx upload:

Use dx build to build the applet:

Run the applet with the -h|--help flag from the CLI to see the usage:

Run the applet using the file ID of the FASTA file you uploaded:

The job's output should end with something like the following:

You can select the output file and view the results.

You can download the output file and check that the filtering actually removed some of the input sequences by using wc to count the original file and the result:

Run the applet with a higher quality score and verify that the result includes even fewer sequences.

Review

In this chapter, you learned how to do the following:

  • Indicate an optional argument with a default value

  • Add a binary executable to a project in the resources directory and use that binary in your applet

  • How to use variations on the input file variables to get the full filename or the filename prefix without the extension.

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Last updated

Was this helpful?