Example 3: fastq_trimmer
In this example, you will translate the bash
app from the previous chapter into Workflow Definition Language (WDL).
You will learn how to:
Use Java Jar files to validate and compile WDL
Use WDL to define an applet's inputs, outputs, and runtime specs
Compile a WDL task into an applet
Getting Started
You will not use a wizard to start this applet, so manually create a directory for your work. Create a file called fastq_trimmer.wdl with the following contents:
version 1.0
task fastq_trimmer {
input {
File input_file
Int quality_score = 30
}
String basename = basename(input_file)
command <<<
fastq_quality_trimmer -Q 33 -t ~{quality_score} \
-i ~{input_file} -o ~{basename}.filtered.fastq
>>>
output {
File output_file = "~{basename}.filtered.fastq"
}
runtime {
docker: "biocontainers/fastxtools:v0.0.14_cv2"
}
}
This line indicates that the WDL follows the 1.0 specification.
The
task
defines the body of the applet.The
input
block defines the same inputs, aFile
called input_file and anInt
(integer) value called quality_score with a default value of 30.This line defines a variable called basename which uses the
basename
function to get the filename of the input file.The
command
block will be executed at runtime. It uses the tilde/twiddle syntax (~{}
) to derefence variables. The output is written to a filename using thebasename
of the input.The
output
defines a singleFile
called output_file.The
runtime
specifies a Biocontainers/Docker that contains the FASTX toolkit binaries.
Checking and Compiling the WDL
To start, validate your WDL with WOMtool:
$ java -jar ~/womtool.jar validate fastq_trimmer.wdl
Success!
Before compiling the WDL into an applet, use dx pwd
to ensure you are in your desired project. If not, run dx select
to select a different project, then use the following command to compile the applet:
$ java -jar ~/dxCompiler.jar compile fastq_trimmer.wdl
[warning] Project is unspecified...using currently selected project project-GJ2k24j0vx804FPyBbxqpQBk
applet-GJ2pgv80vx84zJ4XJF6GPXz7
Use dx run
as in the previous chapter to run the applet with the -h|--help
option to that the usage looks identical to the bash
version:
usage: dx run applet-GJ2pgv80vx84zJ4XJF6GPXz7 [-iINPUT_NAME=VALUE ...]
Applet: fastq_trimmer
Inputs:
input_file: -iinput_file=(file)
quality_score: [-iquality_score=(int, default=30)]
Reserved for dxCompiler
overrides___: [-ioverrides___=(hash)]
overrides______dxfiles: [-ioverrides______dxfiles=(file) [-ioverrides______dxfiles=... [...]]]
Outputs:
output_file: output_file (file)
From the perspective of the user, there is no difference between native/bash
applets and those written in WDL. You should use whichever syntax you find most convenient to the task at hand. For instance, this applet leverages an existing Docker container created by the Biocontainers Community rather than adding the binary as a resource.
You can run the applet using the command-line arguments as shown, or you can create a JSON file with the arguments as follows:
$ cat inputs.json
{
"input_file": {
"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"
},
"quality_score": 35
}
You can run the applet and watch the job with the following command:
$ dx run applet-GJ2pgv80vx84zJ4XJF6GPXz7 -f inputs.json -y --watch
Using input JSON:
{
"input_file": {
"$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"
},
"quality_score": 35
}
Calling applet-GJ2pgv80vx84zJ4XJF6GPXz7 with output destination
project-GJ2k24j0vx804FPyBbxqpQBk:/
Job ID: job-GJ2ppvQ0vx88k8bv9pvGyjGX
Job Log
-------
Watching job job-GJ2ppvQ0vx88k8bv9pvGyjGX. Press Ctrl+C to stop watching.
The output will look quite different from the bash
app, but the basics are still the same. In this version, notice that you do not need to download the inputs or upload the outputs. Once the input files are in place, the command
block is run and the input files and variables are dereferenced properly. When the job has completed, run dx describe
to see the inputs and outputs:
$ dx describe job-GJ2ppvQ0vx88k8bv9pvGyjGX
Result 1:
ID job-GJ2ppvQ0vx88k8bv9pvGyjGX
Class job
Job name fastq_trimmer
Executable name fastq_trimmer
Project context project-GJ2k24j0vx804FPyBbxqpQBk
Region aws:us-east-1
Billed to org-sos
Workspace container-GJ2ppx80773k09b8F6qKGJBb
Applet applet-GJ2pgv80vx84zJ4XJF6GPXz7
Instance Type mem1_ssd1_v2_x2
Priority high
State done
Root execution job-GJ2ppvQ0vx88k8bv9pvGyjGX
Origin job job-GJ2ppvQ0vx88k8bv9pvGyjGX
Parent job -
Function main
Input input_file = file-GJ2k2V80vx88z3zyJbVXZj3G
quality_score = 35
Output output_file = file-GJ2pv300773ypy03Jg2vYZ9f
...
Download the output file to ensure it looks like a correct result:
$ dx download file-GJ2pv300773ypy03Jg2vYZ9f
[===========================================================>]
Completed 14,357,774 of 14,357,774 bytes (100%) ~/fastq_trimmer_wdl/small-celegans-sample.fastq.filtered.fastq
$ wc -l small-celegans-sample.fastq.filtered.fastq
98624 small-celegans-sample.fastq.filtered.fastq
Documentation with Makefiles
You may find it useful to create a Makefile with all the steps documented in a runnable fashion:
WDL = fastq_trimmer.wdl
PROJECT_ID = project-GJ2k24j0vx804FPyBbxqpQBk
DXCOMPILER = java -jar ~/dxCompiler.jar
CROMWELL = java -jar ~/cromwell.jar
WOMTOOL = java -jar ~/womtool.jar
WORKFLOW_ID = applet-GJ2pgv80vx84zJ4XJF6GPXz7
validate:
$(WOMTOOL) validate $(WDL)
check:
miniwdl check $(WDL)
compile:
$(DXCOMPILER) compile $(WDL) \
-archive \
-folder /workflows \
-project $(PROJECT_ID)
run:
dx run $(WORKFLOW_ID) \
-f inputs.json \
--destination $(PROJECT_ID):/output \
-y --watch
Now you can run make compile
rather than type out the rather long Java command.
Review
The WDL version of the FastQTrimmmer applet is arguable simpler than the bash
version. It uses just one file, fastq_trimmer.wdl, and about 20 lines of text, whereas the bash
version requires at least dxapp.json, a bash
script, and the resources tarball.
In this chapter, you learned how to:
Use a Biocontainers Docker image for the necessary binary executables from FASTX toolkit
Define the same inputs, outputs, and commands as the
bash
applet from Chapter 3Use a Makefile to define project shortcuts to validate, compile, and run an applet
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?