Example 2: fastq_quality_trimmer

In this chapter, you'll learn to create an applet that uses the executable from the FASTX-Toolkit collection of command-line tools for processing short-read FASTA and FASTQ files. You'll use the applet to run FastQTrimmer on a FASTQ file, creating a trimmed reads file that you can then use for further analysis.

You will learn the following:

How to accept an optional integer argument from the user
How to add resource files to an applet such as a binary executable that can be used in your applet code

Starting the Applet

Run dx-app-wizard mytrimmer to create the mytrimmer applet. You have already added the app name, so you can press enter when prompted. You can add a title and summary if you would like, as well as version.

Start the input specification with the input FASTQ:

Input Specification

You will now be prompted for each input parameter to your app.  Each parameter
should have a unique name that uses only the underscore "_" and alphanumeric
characters, and does not start with a number.

1st input name (<ENTER> to finish): input_file
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

Next, indicate an optional integer for the quality score:

2nd input name (<ENTER> to finish): quality_score
Label (optional human-readable name) []: Quality score
Choose a class (<TAB> twice for choices): int
This is an optional parameter [y/n]: y
A default value should be provided [y/n]: y
  Default value: 30

Press Enter to skip a third input and move to the output specification, which should define a single output file:

Output Specification

You will now be prompted for each output parameter of your app.  Each
parameter should have a unique name that uses only the underscore "_" and
alphanumeric characters, and does not start with a number.

1st output name (<ENTER> to finish): output_file
Label (optional human-readable name) []: Output file
Choose a class (<TAB> twice for choices): file

Press enter to exit the output section.

Set a timeout policy if you would like.

Answer the remaining questions to create a bash applet. The applet does not need access to the internet or parent project, and you can choose the default instance type.

Open the mytrimmer/dxapp.json in a text editor to view the inputSpec:

  "inputSpec": [
    {
      "name": "input_file",
      "label": "Input file",
      "class": "file",
      "optional": false,
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "quality_score",
      "label": "Quality score",
      "class": "int",
      "optional": true,
      "default": 30,
      "help": ""
    }
  ],

To make input file selection more convenient for the user, edit the patterns for the file extensions of the input_file to be those commonly used for FASTQ files:

    {
      "name": "input_file",
      "label": "Input file",
      "class": "file",
      "optional": false,
      "patterns": [
        "*.fastq",
        "*.fq"
      ],
      "help": ""
    }

These patterns are used in the web interface to filter files for the user, but it's not a requirement that the input files match these patterns. The file filter can be turned off by the user, so these patterns are merely suggestions.

Adding a Binary Resource

Next, you will add a binary executable file from the FASTX toolkit. Download and unpack the FASTX toolkit binaries:

wget https://github.com/agordon/fastx_toolkit/releases/download/0.0.14/fastx_toolkit-0.0.14.tar.bz2

tar xvf fastx_toolkit-0.0.14.tar.bz2

x ./bin/fasta_clipping_histogram.pl
x ./bin/fasta_formatter
x ./bin/fasta_nucleotide_changer
x ./bin/fastq_masker
x ./bin/fastq_quality_boxplot_graph.sh
x ./bin/fastq_quality_converter
x ./bin/fastq_quality_filter
x ./bin/fastq_quality_trimmer
x ./bin/fastq_to_fasta
x ./bin/fastx_artifacts_filter
x ./bin/fastx_barcode_splitter.pl
x ./bin/fastx_clipper
x ./bin/fastx_collapser
x ./bin/fastx_nucleotide_distribution_graph.sh
x ./bin/fastx_nucleotide_distribution_line_graph.sh
x ./bin/fastx_quality_stats
x ./bin/fastx_renamer
x ./bin/fastx_reverse_complement
x ./bin/fastx_trimmer
x ./bin/fastx_uncollapser

Then make the executable with the make file. This will create your executable.

The files are also here to download and for you to unpack:

5MB

FASTX.zip

Writing the Applet

Update mytrimmer/src/mytrimmer.sh with the following code:

#!/bin/bash

set -exuo pipefail

main() {
    echo "Value of input_file: '$input_file'"
    echo "Value of quality_score: '$quality_score'"

    dx download "$input_file" -o "$input_file_name" 

    outfile="${input_file_prefix}.filtered.fastq" 

    fastq_quality_trimmer -Q 33 -t ${quality_score} -i "$input_file_name" -o "$outfile"

    outfile_id=$(dx upload $outfile --brief) 

    dx-jobutil-add-output output_file "$outfile_id" --class=file 
}

The variables $input_file and $input_file_name are based on the inputSpec name input_file. The first is a record-like string {"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"}, while the latter is the filename small-celegans-sample.fastq.
The variable $input_file_prefix is the name of the input file without the file extension, so small-celegans-sample, which is used to create the output filename small-celegans-sample.filtered.fastq. See the documentation.
Run fastq_quality_trimmer using the given $quality_score and write to the output filename. The -Q option is an undocumented option to indicate that the scores are in phred 33.
Upload the output file, which returns another record-like string describing the newly created file.
Add the newly uploaded record as a file output of the job.

You don't need to indicate the full path to fastq_quality_trimmer because it will exist in the directory /usr/local/bin, which is in the standard $PATH.

Creating a Project for the Data and Applet

Add the sample FASTQ file to the project either by using the URL importer as shown in Figure 6, or download the file to your computer and upload through the web interface or using dx upload:

wget https://dl.dnanex.us/F/D/Bp43z7pb2JX8jpB035j4424Vp4Y6qpQ6610ZXg5F/small-celegans-sample.fastq

 dx upload small-celegans-sample.fastq

[===========================================================>]
Uploaded 16,801,690 of 16,801,690 bytes (100%) small-celegans-sample.fastq
ID                    file-GJ2k2V80vx88z3zyJbVXZj3G
Class                 file
Project               project-GJ2k24j0vx804FPyBbxqpQBk
Folder                /
Name                  small-celegans-sample.fastq
State                 closing
Visibility            visible
Types                 -
Properties            -
Tags                  -
Outgoing links        -
Created               Tue Oct 11 08:52:37 2022
Created by            kyclark
Last modified         Tue Oct 11 08:52:53 2022
Media type
archivalState         "live"
cloudAccount          "cloudaccount-dnanexus"

Use dx build to build the applet:

$ dx build mytrimmer -f
{"id": "applet-GJ2k5780vx804FPyBbxqpQQ0"}

Run the applet with the -h|--help flag from the CLI to see the usage:

$ dx run applet-GJ2k5780vx804FPyBbxqpQQ0 -h
usage: dx run applet-GJ2k5780vx804FPyBbxqpQQ0 [-iINPUT_NAME=VALUE ...]

Applet: FastQTrimmer

mytrimmer

Inputs:
  Input file: -iinput_file=(file)

  Quality score: [-iquality_score=(int, default=30)]

Outputs:
  Output file: output_file (file)

Run the applet using the file ID of the FASTA file you uploaded:

$ dx run applet-GJ2k5780vx804FPyBbxqpQQ0 \
> -iinput_file=file-GJ2k2V80vx88z3zyJbVXZj3G -y --watch

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"
    }
}

Calling applet-GJ2k5780vx804FPyBbxqpQQ0 with output destination
  project-GJ2k24j0vx804FPyBbxqpQBk:/

Job ID: job-GJ2k5F00vx84k2X3BqqZ5Zpp

Job Log
-------
Watching job job-GJ2k5F00vx84k2X3BqqZ5Zpp. Press Ctrl+C to stop watching.

The job's output should end with something like the following:

2022-10-11 16:31:18 FastQTrimmer STDERR + echo 'Value of input_file:
'\''{"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"}'\'''
2022-10-11 16:31:18 FastQTrimmer STDERR + echo 'Value of quality_score:
'\''30'\'''
2022-10-11 16:31:18 FastQTrimmer STDOUT Value of input_file:
'{"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"}'
2022-10-11 16:31:18 FastQTrimmer STDOUT Value of quality_score: '30'
2022-10-11 16:31:18 FastQTrimmer STDERR + dx download '{"$dnanexus_link":
"file-GJ2k2V80vx88z3zyJbVXZj3G"}' -o small-celegans-sample.fastq
2022-10-11 16:31:19 FastQTrimmer STDERR + outfile=
small-celegans-sample.filtered.fastq
2022-10-11 16:31:19 FastQTrimmer STDERR + fastq_quality_trimmer -Q 33
-t 30 -i small-celegans-sample.fastq -o small-celegans-sample.filtered.fastq
2022-10-11 16:31:27 FastQTrimmer STDERR ++ dx upload
small-celegans-sample.filtered.fastq --brief
2022-10-11 16:31:28 FastQTrimmer STDERR + outfile_id=
file-GJ2zkYj06GbzP8XBB4bVGxQ6
2022-10-11 16:31:28 FastQTrimmer STDERR + dx-jobutil-add-output output_file
file-GJ2zkYj06GbzP8XBB4bVGxQ6 --class=file

You can select the output file and view the results.

You can download the output file and check that the filtering actually removed some of the input sequences by using wc to count the original file and the result:

$ dx download file-GJ2k73j08bbkVxK9Gxx8Z891
[===========================================================>]
Completed 15,557,666 of 15,557,666 bytes (100%) .../fastq_trimmer/small-celegans-sample.filtered.fastq
$ wc -l small-celegans-sample.f*
  100000 small-celegans-sample.fastq
   99848 small-celegans-sample.filtered.fastq
  199848 total

Run the applet with a higher quality score and verify that the result includes even fewer sequences.

Review

In this chapter, you learned how to do the following:

Indicate an optional argument with a default value
Add a binary executable to a project in the resources directory and use that binary in your applet
How to use variations on the input file variables to get the full filename or the filename prefix without the extension.

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

PreviousExample 1: Word Count (wc)NextExample 3: samtools

Last updated 5 months ago

Was this helpful?