Example 4: cnvkit

There is an existing public Docker image available for CNVkit ("etal/cnvkit:latest"), so another option is to build a WDL version that will download and use this image at runtime rather than installing the Python and R modules ourselves.

In this example, you will:

  • Use WDL and Docker to build the CNVkit

Getting Started

To start, create a new directory called cnvkit_wdl parallel to the bash directory. Inside this new directory, create the file workflow.wdl with the following contents:

version 1.0

task cnvkit_wdl_kyc {
    input {
        Array[File] bam_tumor
        File reference
    }

    command <<<
        cnvkit.py batch \
            ~{sep=" " bam_tumor} \
            -r ~{reference} \
            -p $(expr $(nproc) -1) \
            -d output/ \
            --scatter
    >>>

    runtime {
        docker: "etal/cnvkit:latest"
        cpu: 16
    }

    output {
        Array[File]+ cns = glob("output/[!.call]*.cns")
        Array[File]+ cns_filtered = glob("output/*.call.cns")
        Array[File]+ plot = glob("output/*-scatter.png")
    }
}

Next, ensure you have a working Java compiler and then download the latest dxCompiler Jar file. You can use the following command to place the 2.10.3 release into your home directory:

$ cd && wget https://github.com/dnanexus/dxCompiler/releases/download/2.10.3/dxCompiler-2.10.3.jar

Use the dxCompiler to turn workflow.wdl into an applet equivalent to the bash version. In the following command, the workflow and all related applets will be placed into a workflows directory in the given project to keep all this neatly contained. The given the project ID project-GFf2Bq8054J0v8kY8zJ1FGQF is the caris_cnvkit project, so change this to if you wish to place this into a different project. Note the use of the -archive option to archive any existing version of the applet and allow the new version to take precendence and the -reorg to reorganize the output files. As shown in the following command, successful compilation will result in printing the new workflow's ID:

$ java -jar ~/dxCompiler-2.10.3.jar compile workflow.wdl \
        -archive \
        -reorg \
        -folder /workflows \
        -project project-GFf2Bq8054J0v8kY8zJ1FGQF
applet-GFyVxpQ0VGFgGQBy4vJ0kxK2

Run the new workflow with the -h|--help flag to verify the inputs:

$ dx run applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 -h
usage: dx run applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 [-iINPUT_NAME=VALUE ...]

Applet: cnvkit_wdl_kyc

Inputs:
  bam_tumor: [-ibam_tumor=(file) [-ibam_tumor=... [...]]]

  reference: -ireference=(file)

 Reserved for dxCompiler
  overrides___: [-ioverrides___=(hash)]

  overrides______dxfiles: [-ioverrides______dxfiles=(file) [-ioverrides______dx>

Outputs:
  cns: cns (array:file)

  cns_filtered: cns_filtered (array:file)

  plot: plot (array:file)

As with the bash version, you can launch the workflow from the CLI as follows:

$ dx run -y --watch applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 \
            -ibam_tumor=file-GFxXjV006kZVQPb20G85VXBp \
            -ireference=file-GFxXvpj06kZfP0QVKq2p2FGF \
            --destination project-GFyPxb00VGFz5JZQ4f5x424q:/users/kyclark

The resulting output will show the JSON you can alternatively use to launch the job:

$ cat inputs.json
{
    "bam_tumor": [
        {
            "$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
        }
    ],
    "reference": {
        "$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
    }
}

Following is the command you can use to launch the workflow from the CLI with the JSON file:

$ dx run -y --watch applet-GFyVxpQ0VGFgGQBy4vJ0kxK2 -f inputs.json \
            --destination project-GFyPxb00VGFz5JZQ4f5x424q:/users/kyclark

As before, you can use the web interface to monitor the progress of the workflow and inspect the outputs.

Saving a Docker Image

Run the following command to start a new cloud workstation:

$ dx run -imax_session_length="1d" app-cloud_workstation --ssh -y

From the cloud workstation, pull the CNVkit Docker image:

$ docker pull etal/cnvkit:latest

Save and compress the image to a file:

$ docker save etal/cnvkit:latest | gzip - > cnvkit.tar.gz

Add the tarball to the project:

$ dx upload cnvkit.tar.gz --path project-GFyPxb00VGFz5JZQ4f5x424q:/
[===========================================================>]
Uploaded 503,092,072 of 503,092,072 bytes (100%) cnvkit.tar.gz
ID                    file-GFyq05j0VGFqJqq54q98pbBK
Class                 file
Project               project-GFyPxb00VGFz5JZQ4f5x424q
Folder                /
Name                  cnvkit.tar.gz
State                 closing
Visibility            visible
Types                 -
Properties            -
Tags                  -
Outgoing links        -
Created               Thu Aug 18 03:20:55 2022
Created by            kyclark
 via the job          job-GFypx3Q0VGFgb71g4gYY3GF3
Last modified         Thu Aug 18 03:20:57 2022
Media type
archivalState         "live"
cloudAccount          "cloudaccount-dnanexus"

Update the WDL to use the tarball:

version 1.0

task cnvkit_wdl_tarball {
    input {
        Array[File] bam_tumor
        File reference
    }

    command <<<
        cnvkit.py batch \
            ~{sep=" " bam_tumor} \
            -r ~{reference} \
            -p $(expr $(nproc) -1) \
            -d output/ \
            --scatter
    >>>

    runtime {
        docker: "dx://file-GFyq05j0VGFqJqq54q98pbBK"
        cpu: 16
    }

    output {
        Array[File]+ cns = glob("output/[!.call]*.cns")
        Array[File]+ cns_filtered = glob("output/*.call.cns")
        Array[File]+ plot = glob("output/*-scatter.png")
    }
}

Build the app and run it.

Review

In this chapter, you learned another strategy for packaging an applet's dependencies using Docker and then running the applet's code inside the Docker image using WDL.

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Last updated

Was this helpful?