Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Create a Project
  • Build a bash app with dx-app-wizard
  • The Input Specification
  • The Output Specification
  • Other Options
  • Examine dxapp.json
  • Add Python and R Module Dependencies
  • Specify File Patterns for Inputs
  • Edit the bash Code
  • Build the Applet
  • Run the bash applet
  • Build an Asset
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Building Applets
  2. Bash

Example 4: cnvkit

PreviousExample 3: samtoolsNextExample 5: samtools with a Docker Image

Last updated 2 months ago

Was this helpful?

To begin, you'll create a bash app to run , which will find "genome-wide copy number from high-throughput sequencing." Create a local directory to hold your work, and consider putting the contents into a source code repository like Git.

In this example, you will:

  • Use various package managers to install dependencies

  • Build an asset

  • Learn to use dx-download-all-inputs and dx-upload-all-outputs

Create a Project

From the web interface, select "Projects → All Projects" to see your project list. Click the "New Project" button to create a new project called "CNVkit." Alternatively, use dx new project to do this from the command line. However you choose to create a project, be sure this has been selected by running dx pwd to check your current working directory and using dx select to select the project, if needed.

Build a bash app with dx-app-wizard

Inside your working directory, run the command dx-app-wizard cnvkit_bash to launch the . Optionally provide a title, summary, and version at the prompts.

The Input Specification

The app will accept two inputs:

  1. One or more BAM files of the tumor samples: Give this input the name bam_tumor with the label "BAM Tumor Files." For the class, choose array:file, and indicate that this is not an optional parameter.

  2. A reference file: Give this input the name reference with the label "Reference." For the class, choose file, and indicate that this is not an optional parameter.

When prompted for the third input, press Enter to end the inputs.

The Output Specification

Define three outputs, each of the type array:file with the following names and whatever labels you feel are appropriate:

  1. cns

  2. cns_filtered

  3. plot

Press Enter when prompted for the fourth output to indicate you are finished.

Other Options

  • Press Enter to accept the default value for the timeout policy.

  • Type bash for the programming language.

  • Type y to indicate that the app will need internet access.

  • Type n to indicate that the app will need access to the parent project.

  • Press Enter to accept the default value for the instance type or select one from the list shown.

You should see a message saying the app's template was created in a directory name matching the app's name. For instance, I have the following:

$ find cnvkit_bash -type f
cnvkit_bash/dxapp.json 
cnvkit_bash/Readme.md 
cnvkit_bash/Readme.developer.md 
cnvkit_bash/src/cnvkit_bash.sh 
  • This is a JSON file containing metadata that will be used to create the app on the DNAnexus platform.

  • A stub for user documentation.

  • A stub for developer documentation.

  • A template bash script for the app's functionality.

Examine dxapp.json

The dxapp.json file that was created by the wizard should look like the following:

{
  "name": "cnvkit_bash",
  "title": "cnvkit_bash",
  "summary": "cnvkit_bash",
  "dxapi": "1.0.0",
  "version": "0.0.1",
  "inputSpec": [
    {
      "name": "bam_tumor",
      "label": "BAM Tumor Files",
      "class": "array:file",
      "optional": false,
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "reference",
      "label": "Reference",
      "class": "file",
      "optional": false,
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],
  "outputSpec": [
    {
      "name": "cns",
      "label": "CNS",
      "class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "cns_filtered",
      "label": "CNS Filtered",
      "class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "plot",
      "label": "Plot",
      "class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],
  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "hours": 48
      }
    },
    "interpreter": "bash",
    "file": "src/cnvkit_bash.sh",
    "distribution": "Ubuntu",
    "release": "20.04",
    "version": "0"
  },
  "access": {
    "network": [
      "*"
    ]
  },
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {
          "instanceType": "mem1_ssd1_v2_x4"
        }
      }
    }
  }
}

Add Python and R Module Dependencies

CNVkit has dependencies on both Python and R modules that must be installed before running. In the dxapp.json, you can specify dependencies that can be installed with the following package managers:

  • apt (Ubuntu)

  • pip (Python)

  • cpan (Perl)

  • cran (\R)

  • gem (Ruby)

To add these runtime dependencies, use a text editor to update the runSpec and add the following execDepends section that will install the Python cnvkit and R BiocManager modules before the app is executed:

"runSpec": {
    "interpreter": "bash",
    "file": "src/cnvkit_bash.sh",
    "distribution": "Ubuntu",
    "release": "20.04",
    "version": "0",
    "execDepends": [
      {
        "name": "cnvkit",
        "package_manager": "pip"
      },
      {
        "name": "BiocManager",
        "package_manager": "cran"
      }
    ],
    "timeoutPolicy": {
      "*": {
        "hours": 48
      }
    }
}

Specify File Patterns for Inputs

In the inputSpec, change the patterns to match the expected file extensions:

  • bam_files: *.bam

  • reference: *.cnn

Your dxapp.json should now look like the following:

{
  "name": "cnvkit_bash",
  "title": "cnvkit_bash",
  "summary": "cnvkit_bash",
  "dxapi": "1.0.0",
  "version": "0.0.1",
  "inputSpec": [
    {
      "name": "bam_tumor",
      "label": "BAM Tumor Files",
      "class": "array:file",
      "optional": false,
      "patterns": [
        "*.bam"
      ],
      "help": ""
    },
    {
      "name": "reference",
      "label": "Reference",
      "class": "file",
      "optional": false,
      "patterns": [
        "*.cnn"
      ],
      "help": ""
    }
  ],
  "outputSpec": [
    {
      "name": "cns",
      "label": "CNS",
      class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "cns_filtered",
      "label": "CNS Filtered",
      "class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    },
    {
      "name": "plot",
      "label": "Plot",
      "class": "array:file",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],
  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "hours": 48
      }
    },
    "execDepends": [
      {
        "name": "cnvkit",
        "package_manager": "pip"
      },
      {
        "name": "BiocManager",
        "package_manager": "cran"
      }
    ],
    "interpreter": "bash",
    "file": "src/cnvkit_bash.sh",
    "distribution": "Ubuntu",
    "release": "20.04",
    "version": "0"
  },
  "access": {
    "network": [
      "*"
    ]
  },
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {
          "instanceType": "mem1_ssd1_v2_x4"
        }
      }
    }
  }
}

Edit the bash Code

The default bash code generated by the wizard starts with a generous header of comments that you may or may not wish to keep. The default code prints the values of the input variables, then downloads the input files individually. The app code belongs in the middle, after downloading the inputs and before uploading the outputs:

main() {

    echo "Value of bam_tumor: '${bam_tumor[@]}'"
    echo "Value of reference: '$reference'"

    # The following line(s) use the dx command-line tool to download your file
    # inputs to the local file system using variable names for the filenames. To
    # recover the original filenames, you can use the output of "dx describe
    # "$variable" --name".

    dx download "$reference" -o reference
    for i in ${!bam_tumor[@]}
    do
        dx download "${bam_tumor[$i]}" -o bam_tumor-$i
    done

    >>>>> Here is where the app code belongs <<<<<

    # The following line(s) use the dx command-line tool to upload your file
    # outputs after you have created them on the local file system.  It assumes
    # that you have used the output field name for the filename for each output,
    # but you can change that behavior to suit your needs.  Run "dx upload -h"
    # to see more options to set metadata.

    cns=$(dx upload cns --brief)
    cns_filtered=$(dx upload cns_filtered --brief)
    plot=$(dx upload plot --brief)

    # The following line(s) use the utility dx-jobutil-add-output to format and
    # add output variables to your job's output as appropriate for the output
    # class.  Run "dx-jobutil-add-output -h" for more information on what it
    # does.

    dx-jobutil-add-output cns "$cns" --class=file
    dx-jobutil-add-output cns_filtered "$cns_filtered" --class=file
    dx-jobutil-add-output plot "$plot" --class=file
}

Replace src/cnvkit_bash.sh this with the following code:

#!/bin/bash

# Set pragmas to print commands and fail on errors
set -exuo pipefail

# Install required R module
Rscript -e "BiocManager::install('DNAcopy')"

# Verify the value of inputs
echo "Value of bam_tumor: '${bam_tumor[@]}'"
echo "Value of reference: '$reference'"

# Place all inputs into the "in" directory
dx-download-all-inputs --parallel

# Use "_path" versions of inputs for file paths
cnvkit.py batch \
    ${bam_tumor_path[@]} \
    -r ${reference_path} \
    -p $(expr $(nproc) - 1) \
    -d cnvkit-out/ \
    --scatter

# Make out directories for each output spec
mkdir -p ~/out/cns/ ~/out/cns_filtered/ ~/out/plot/

# Move CNVkit outputs to the "out" directory for upload
mv cnvkit-out/*.call.cns    ~/out/cns_filtered/
mv cnvkit-out/*.cns         ~/out/cns/
mv cnvkit-out/*-scatter.png ~/out/plot/

# Upload and annotate all output files
dx-upload-all-outputs --parallel

Rather than downloading the inputs individually as in the original template, this version downloads the all inputs in parallel with the following command:

dx-download-all-inputs --parallel

This will create an in directory with subdirectories named according to the input names. Note that bam_files input is an array of files, so this directory will contain numbered subdirectories starting at 0 for each input file:

in/bam_files/0/...
in/bam_files/1/...
in/reference/...

Similarly, the preceding code uses dx-upload-all-outputs, which expects an out directory with subdirectories named according to each of the output specifications.

Build the Applet

Use dx pwd to ensure you are in the correct project and dx select to change projects, if necessary. If you are inside the bash source directory where the dxapp.json file exists, you can run dx build -f If you are in the parent directory, run dx build -f cnvkit_bash. Here is a sample output from successfully compiling the app:

$ dx build -f
{"id": "applet-GFyV3kj0VGFkV8k04f3K11QY"}

The -f|--overwrite flag indicates you wish to overwrite any previous version of the applet. You may also want to use the -a|--archive flag to move any previous versions to an archived location. You won't need either of these flags the first time you compile, but subsequent builds will require that you indicate how to handle previous versions of the applet. Run dx build --help to learn more about build options.

Run the bash applet

Download this BAM file and add it to the inputs directory

Indicate an output directory, click the Run button, and then click the "View Log" to watch the job's progress.

You can also run the applet on the command line with the -h|--help flag to verify the inputs and outputs:

$ dx run applet-GFyV3kj0VGFkV8k04f3K11QY -h
usage: dx run applet-GFyV2G8054JBQXY64g4F7ZKk [-iINPUT_NAME=VALUE ...]

Applet: cnvkit_bash

cnvkit_bash

Inputs:
  BAM Tumor Files: -ibam_tumor=(file) [-ibam_tumor=... [...]]

  Reference: -ireference=(file)

Outputs:
  CNS: cns (array:file)

  CNS Filtered: cns_filtered (array:file)

  Plot: plot (array:file)

Select the input files on the web interface to note the file IDs that can be used to execute the app from the command line as follows:

$ dx run -y --watch applet-GFyV3kj0VGFkV8k04f3K11QY \
    -ibam_tumor=file-GFxXjV006kZVQPb20G85VXBp \
    -ireference=file-GFxXvpj06kZfP0QVKq2p2FGF \
    --destination /outputs

You should see output from the preceding command that includes a JSON document with the inputs:

Using input JSON:
{
    "bam_tumor": [
        {
            "$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
        }
    ],
    "reference": {
        "$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
    }
}

Note that you can place this JSON into a file and launch the applet with the inputs specified with the -f|--input-json-file option, as follows. Use dx run -h to learn about other command-line options:

$ dx run -y --watch applet-GFyV3kj0VGFkV8k04f3K11QY \
        -f cnvkit_bash/inputs.json \
        --destination /outputs

Note the job ID from dx run, and use dx watch to watch the job to completion and dx describe to view the job's metadata. Alternatively, you can use the web platform to launch the job, using the file selector to specify each of the inputs, and then use the "Monitor" view to check the job's status, and view the output reference file when job completes.

Build an Asset

You'll notice the applet takes quite a while to run (around 14 minutes for me) because of the module installations. You can build an asset for these installations and use this in dxapp.json. Create a directory called cnvkit_asset with the following file dxasset.json:

{
    "name": "cnvkit_asset",
    "title": "cnvkit_asset",
    "description": "cnvkit_asset",
    "version": "0.0.1",
    "distribution": "Ubuntu",
    "release": "20.04",
    "execDepends": [
        {
          "name": "cnvkit",
          "package_manager": "pip"
        },
        {
          "name": "BiocManager",
          "package_manager": "cran"
        }
    ]
}

Also create a Makefile with the following contents:

SHELL=/bin/bash -exuo pipefail
all:
    sudo Rscript -e "BiocManager::install('DNAcopy')"

Run dx build_asset to create the asset. This will launch a job that will report the asset ID at the end:

Asset bundle 'record-GFyVY000X1ZK3yGg4qv32GXv' is built and can now be used
in your app/applet's dxapp.json

Update the runSpec in dxapp.json to the following:

  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "hours": 48
      }
    },
    "assetDepends": [{"id": "record-GFyVY000X1ZK3yGg4qv32GXv"}],
    "interpreter": "bash",
    "file": "src/cnvkit_bash.sh",
    "distribution": "Ubuntu",
    "release": "20.04",
    "version": "0"
  },

Use dx build -f and note the new app's ID. Create a JSON input as follows:

$ cat inputs.json
{
    "bam_tumor": [
        {
            "$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
        }
    ],
    "reference": {
        "$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
    }
}

Launch the new app from the CLI with the following command:

$ dx run applet-GFyVppQ0VGFxvvx44j43YyPz -f inputs.json -y

Use dx watch with the new job ID to see how the run now uses the asset to run faster. I see about a 10-minute difference with the asset.

Review

You learned more ways to include app dependencies using package managers and a Makefile as well as by building an asset. The first strategy happens at runtime while the latter builds all the dependencies before the applet is run, making the runtime much faster.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

See the for a more complete understanding of all the possible fields and their implications.

The Python module cnvkit can be installed via pip, but the software also requires an R module called DNAcopy that must be installed using , which must first be installed using cran. This means you'll have to manually install the DNAcopy module when the app starts.

CNVKit
app wizard tool
app metadata documentation
Bioconductor
Full Documentation
15MB
BAM.zip
archive