Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Getting Started
  • Checking and Compiling the WDL
  • Documentation with Makefiles
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Building Workflows
  2. WDL

Example 3: fastq_trimmer

In this example, you will translate the bash app from the previous chapter into Workflow Definition Language (WDL).

You will learn how to:

  • Use Java Jar files to validate and compile WDL

  • Use WDL to define an applet's inputs, outputs, and runtime specs

  • Compile a WDL task into an applet

Getting Started

You will not use a wizard to start this applet, so manually create a directory for your work. Create a file called fastq_trimmer.wdl with the following contents:

version 1.0 

task fastq_trimmer { 
    input { 
        File input_file
        Int quality_score = 30
    }

    String basename = basename(input_file) 

    command <<<
        fastq_quality_trimmer -Q 33 -t ~{quality_score} \ 
            -i ~{input_file} -o ~{basename}.filtered.fastq
    >>>

    output { 
        File output_file = "~{basename}.filtered.fastq"
    }

    runtime { 
        docker: "biocontainers/fastxtools:v0.0.14_cv2"
    }
}
  • The task defines the body of the applet.

  • The input block defines the same inputs, a File called input_file and an Int (integer) value called quality_score with a default value of 30.

  • The command block will be executed at runtime. It uses the tilde/twiddle syntax (~{}) to derefence variables. The output is written to a filename using the basename of the input.

  • The output defines a single File called output_file.

  • The runtime specifies a Biocontainers/Docker that contains the FASTX toolkit binaries.

Checking and Compiling the WDL

To start, validate your WDL with WOMtool:

$ java -jar ~/womtool.jar validate fastq_trimmer.wdl
Success!

Before compiling the WDL into an applet, use dx pwd to ensure you are in your desired project. If not, run dx select to select a different project, then use the following command to compile the applet:

$ java -jar ~/dxCompiler.jar compile fastq_trimmer.wdl
[warning] Project is unspecified...using currently selected project project-GJ2k24j0vx804FPyBbxqpQBk
applet-GJ2pgv80vx84zJ4XJF6GPXz7

Use dx run as in the previous chapter to run the applet with the -h|--help option to that the usage looks identical to the bash version:

usage: dx run applet-GJ2pgv80vx84zJ4XJF6GPXz7 [-iINPUT_NAME=VALUE ...]

Applet: fastq_trimmer

Inputs:
  input_file: -iinput_file=(file)

  quality_score: [-iquality_score=(int, default=30)]

 Reserved for dxCompiler
  overrides___: [-ioverrides___=(hash)]

  overrides______dxfiles: [-ioverrides______dxfiles=(file) [-ioverrides______dxfiles=... [...]]]

Outputs:
  output_file: output_file (file)

You can run the applet using the command-line arguments as shown, or you can create a JSON file with the arguments as follows:

$ cat inputs.json
{
    "input_file": {
        "$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"
    },
    "quality_score": 35
}

You can run the applet and watch the job with the following command:

$ dx run applet-GJ2pgv80vx84zJ4XJF6GPXz7 -f inputs.json -y --watch

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GJ2k2V80vx88z3zyJbVXZj3G"
    },
    "quality_score": 35
}

Calling applet-GJ2pgv80vx84zJ4XJF6GPXz7 with output destination
project-GJ2k24j0vx804FPyBbxqpQBk:/

Job ID: job-GJ2ppvQ0vx88k8bv9pvGyjGX

Job Log
-------
Watching job job-GJ2ppvQ0vx88k8bv9pvGyjGX. Press Ctrl+C to stop watching.

The output will look quite different from the bash app, but the basics are still the same. In this version, notice that you do not need to download the inputs or upload the outputs. Once the input files are in place, the command block is run and the input files and variables are dereferenced properly. When the job has completed, run dx describe to see the inputs and outputs:

$ dx describe job-GJ2ppvQ0vx88k8bv9pvGyjGX
Result 1:
ID                    job-GJ2ppvQ0vx88k8bv9pvGyjGX
Class                 job
Job name              fastq_trimmer
Executable name       fastq_trimmer
Project context       project-GJ2k24j0vx804FPyBbxqpQBk
Region                aws:us-east-1
Billed to             org-sos
Workspace             container-GJ2ppx80773k09b8F6qKGJBb
Applet                applet-GJ2pgv80vx84zJ4XJF6GPXz7
Instance Type         mem1_ssd1_v2_x2
Priority              high
State                 done
Root execution        job-GJ2ppvQ0vx88k8bv9pvGyjGX
Origin job            job-GJ2ppvQ0vx88k8bv9pvGyjGX
Parent job            -
Function              main
Input                 input_file = file-GJ2k2V80vx88z3zyJbVXZj3G
                      quality_score = 35
Output                output_file = file-GJ2pv300773ypy03Jg2vYZ9f
...

Download the output file to ensure it looks like a correct result:

$ dx download file-GJ2pv300773ypy03Jg2vYZ9f
[===========================================================>]
Completed 14,357,774 of 14,357,774 bytes (100%) ~/fastq_trimmer_wdl/small-celegans-sample.fastq.filtered.fastq
$ wc -l small-celegans-sample.fastq.filtered.fastq
   98624 small-celegans-sample.fastq.filtered.fastq

Documentation with Makefiles

You may find it useful to create a Makefile with all the steps documented in a runnable fashion:

WDL = fastq_trimmer.wdl
PROJECT_ID = project-GJ2k24j0vx804FPyBbxqpQBk
DXCOMPILER = java -jar ~/dxCompiler.jar
CROMWELL = java -jar ~/cromwell.jar
WOMTOOL = java -jar ~/womtool.jar
WORKFLOW_ID = applet-GJ2pgv80vx84zJ4XJF6GPXz7

validate:
    $(WOMTOOL) validate $(WDL)

check:
    miniwdl check $(WDL)

compile:
    $(DXCOMPILER) compile $(WDL) \
        -archive \
        -folder /workflows \
        -project $(PROJECT_ID)

run:
    dx run $(WORKFLOW_ID) \
        -f inputs.json \
        --destination $(PROJECT_ID):/output \
        -y --watch

Now you can run make compile rather than type out the rather long Java command.

Review

The WDL version of the FastQTrimmmer applet is arguable simpler than the bash version. It uses just one file, fastq_trimmer.wdl, and about 20 lines of text, whereas the bash version requires at least dxapp.json, a bash script, and the resources tarball.

In this chapter, you learned how to:

  • Use a Biocontainers Docker image for the necessary binary executables from FASTX toolkit

  • Define the same inputs, outputs, and commands as the bash applet from Chapter 3

  • Use a Makefile to define project shortcuts to validate, compile, and run an applet

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousExample 2: Word Count (wc)NextExample 4: cnvkit

Last updated 4 months ago

Was this helpful?

This line indicates that the WDL follows the .

This line defines a variable called basename which uses the function to get the filename of the input file.

From the perspective of the user, there is no difference between native/bash applets and those written in WDL. You should use whichever syntax you find most convenient to the task at hand. For instance, this applet leverages an existing Docker container created by the rather than adding the binary as a resource.

1.0 specification
basename
Biocontainers Community
Full Documentation