Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Getting Started
  • Saving a Docker Image
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Building Workflows
  2. WDL

Example 4: cnvkit

There is an existing public Docker image available for CNVkit ("etal/cnvkit:latest"), so another option is to build a WDL version that will download and use this image at runtime rather than installing the Python and R modules ourselves.

In this example, you will:

  • Use WDL and Docker to build the CNVkit

Getting Started

To start, create a new directory called cnvkit_wdl parallel to the bash directory. Inside this new directory, create the file workflow.wdl with the following contents:

version 1.0

task cnvkit_wdl_kyc {
    input {
        Array[File] bam_tumor
        File reference
    }

    command <<<
        cnvkit.py batch \
            ~{sep=" " bam_tumor} \
            -r ~{reference} \
            -p $(expr $(nproc) -1) \
            -d output/ \
            --scatter
    >>>

    runtime {
        docker: "etal/cnvkit:latest"
        cpu: 16
    }

    output {
        Array[File]+ cns = glob("output/[!.call]*.cns")
        Array[File]+ cns_filtered = glob("output/*.call.cns")
        Array[File]+ plot = glob("output/*-scatter.png")
    }
}

Next, ensure you have a working Java compiler and then download the latest dxCompiler Jar file. You can use the following command to place the 2.10.3 release into your home directory:

$ cd && wget https://github.com/dnanexus/dxCompiler/releases/download/2.10.3/dxCompiler-2.10.3.jar

Use the dxCompiler to turn workflow.wdl into an applet equivalent to the bash version. In the following command, the workflow and all related applets will be placed into a workflows directory in the given project to keep all this neatly contained. The given the project ID project-GFf2Bq8054J0v8kY8zJ1FGQF is the caris_cnvkit project, so change this to if you wish to place this into a different project. Note the use of the -archive option to archive any existing version of the applet and allow the new version to take precendence and the -reorg to reorganize the output files. As shown in the following command, successful compilation will result in printing the new workflow's ID:

$ java -jar ~/dxCompiler-2.10.3.jar compile workflow.wdl \
        -archive \
        -reorg \
        -folder /workflows \
        -project project-GFf2Bq8054J0v8kY8zJ1FGQF
applet-GFyVxpQ0VGFgGQBy4vJ0kxK2

Run the new workflow with the -h|--help flag to verify the inputs:

$ dx run applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 -h
usage: dx run applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 [-iINPUT_NAME=VALUE ...]

Applet: cnvkit_wdl_kyc

Inputs:
  bam_tumor: [-ibam_tumor=(file) [-ibam_tumor=... [...]]]

  reference: -ireference=(file)

 Reserved for dxCompiler
  overrides___: [-ioverrides___=(hash)]

  overrides______dxfiles: [-ioverrides______dxfiles=(file) [-ioverrides______dx>

Outputs:
  cns: cns (array:file)

  cns_filtered: cns_filtered (array:file)

  plot: plot (array:file)

As with the bash version, you can launch the workflow from the CLI as follows:

$ dx run -y --watch applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 \
            -ibam_tumor=file-GFxXjV006kZVQPb20G85VXBp \
            -ireference=file-GFxXvpj06kZfP0QVKq2p2FGF \
            --destination project-GFyPxb00VGFz5JZQ4f5x424q:/users/kyclark

The resulting output will show the JSON you can alternatively use to launch the job:

$ cat inputs.json
{
    "bam_tumor": [
        {
            "$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
        }
    ],
    "reference": {
        "$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
    }
}

Following is the command you can use to launch the workflow from the CLI with the JSON file:

$ dx run -y --watch applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 -f inputs.json \
            --destination project-GFyPxb00VGFz5JZQ4f5x424q:/users/kyclark

As before, you can use the web interface to monitor the progress of the workflow and inspect the outputs.

Saving a Docker Image

Run the following command to start a new cloud workstation:

$ dx run -imax_session_length="1d" app-cloud_workstation --ssh -y

From the cloud workstation, pull the CNVkit Docker image:

$ docker pull etal/cnvkit:latest

Save and compress the image to a file:

$ docker save etal/cnvkit:latest | gzip - > cnvkit.tar.gz

Add the tarball to the project:

$ dx upload cnvkit.tar.gz --path project-GFyPxb00VGFz5JZQ4f5x424q:/
[===========================================================>]
Uploaded 503,092,072 of 503,092,072 bytes (100%) cnvkit.tar.gz
ID                    file-GFyq05j0VGFqJqq54q98pbBK
Class                 file
Project               project-GFyPxb00VGFz5JZQ4f5x424q
Folder                /
Name                  cnvkit.tar.gz
State                 closing
Visibility            visible
Types                 -
Properties            -
Tags                  -
Outgoing links        -
Created               Thu Aug 18 03:20:55 2022
Created by            kyclark
 via the job          job-GFypx3Q0VGFgb71g4gYY3GF3
Last modified         Thu Aug 18 03:20:57 2022
Media type
archivalState         "live"
cloudAccount          "cloudaccount-dnanexus"

Update the WDL to use the tarball:

version 1.0

task cnvkit_wdl_tarball {
    input {
        Array[File] bam_tumor
        File reference
    }

    command <<<
        cnvkit.py batch \
            ~{sep=" " bam_tumor} \
            -r ~{reference} \
            -p $(expr $(nproc) -1) \
            -d output/ \
            --scatter
    >>>

    runtime {
        docker: "dx://file-GFyq05j0VGFqJqq54q98pbBK"
        cpu: 16
    }

    output {
        Array[File]+ cns = glob("output/[!.call]*.cns")
        Array[File]+ cns_filtered = glob("output/*.call.cns")
        Array[File]+ plot = glob("output/*-scatter.png")
    }
}

Build the app and run it.

Review

In this chapter, you learned another strategy for packaging an applet's dependencies using Docker and then running the applet's code inside the Docker image using WDL.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousExample 3: fastq_trimmerNextExample 5: workflow

Last updated 4 months ago

Was this helpful?

Full Documentation