Example 4: cnvkit
To begin, you'll create a bash app to run CNVKit, which will find "genome-wide copy number from high-throughput sequencing." Create a local directory to hold your work, and consider putting the contents into a source code repository like Git.
In this example, you will:
Use various package managers to install dependencies
Build an asset
Learn to use
dx-download-all-inputs
anddx-upload-all-outputs
Create a Project
From the web interface, select "Projects → All Projects" to see your project list. Click the "New Project" button to create a new project called "CNVkit." Alternatively, use dx new project
to do this from the command line. However you choose to create a project, be sure this has been selected by running dx pwd
to check your current working directory and using dx select
to select the project, if needed.
Build a bash app with dx-app-wizard
Inside your working directory, run the command dx-app-wizard cnvkit_bash
to launch the app wizard tool. Optionally provide a title, summary, and version at the prompts.
The Input Specification
The app will accept two inputs:
One or more BAM files of the tumor samples: Give this input the name bam_tumor with the label "BAM Tumor Files." For the class, choose array:file, and indicate that this is not an optional parameter.
A reference file: Give this input the name reference with the label "Reference." For the class, choose file, and indicate that this is not an optional parameter.
When prompted for the third input, press Enter to end the inputs.
The Output Specification
Define three outputs, each of the type array:file with the following names and whatever labels you feel are appropriate:
cns
cns_filtered
plot
Press Enter when prompted for the fourth output to indicate you are finished.
Other Options
Press Enter to accept the default value for the timeout policy.
Type bash for the programming language.
Type y to indicate that the app will need internet access.
Type n to indicate that the app will need access to the parent project.
Press Enter to accept the default value for the instance type or select one from the list shown.
You should see a message saying the app's template was created in a directory name matching the app's name. For instance, I have the following:
$ find cnvkit_bash -type f
cnvkit_bash/dxapp.json
cnvkit_bash/Readme.md
cnvkit_bash/Readme.developer.md
cnvkit_bash/src/cnvkit_bash.sh
This is a JSON file containing metadata that will be used to create the app on the DNAnexus platform.
A stub for user documentation.
A stub for developer documentation.
A template bash script for the app's functionality.
Examine dxapp.json
The dxapp.json file that was created by the wizard should look like the following:
{
"name": "cnvkit_bash",
"title": "cnvkit_bash",
"summary": "cnvkit_bash",
"dxapi": "1.0.0",
"version": "0.0.1",
"inputSpec": [
{
"name": "bam_tumor",
"label": "BAM Tumor Files",
"class": "array:file",
"optional": false,
"patterns": [
"*"
],
"help": ""
},
{
"name": "reference",
"label": "Reference",
"class": "file",
"optional": false,
"patterns": [
"*"
],
"help": ""
}
],
"outputSpec": [
{
"name": "cns",
"label": "CNS",
"class": "array:file",
"patterns": [
"*"
],
"help": ""
},
{
"name": "cns_filtered",
"label": "CNS Filtered",
"class": "array:file",
"patterns": [
"*"
],
"help": ""
},
{
"name": "plot",
"label": "Plot",
"class": "array:file",
"patterns": [
"*"
],
"help": ""
}
],
"runSpec": {
"timeoutPolicy": {
"*": {
"hours": 48
}
},
"interpreter": "bash",
"file": "src/cnvkit_bash.sh",
"distribution": "Ubuntu",
"release": "20.04",
"version": "0"
},
"access": {
"network": [
"*"
]
},
"regionalOptions": {
"aws:us-east-1": {
"systemRequirements": {
"*": {
"instanceType": "mem1_ssd1_v2_x4"
}
}
}
}
}
See the app metadata documentation for a more complete understanding of all the possible fields and their implications.
Add Python and R Module Dependencies
CNVkit has dependencies on both Python and R modules that must be installed before running. In the dxapp.json
, you can specify dependencies that can be installed with the following package managers:
apt
(Ubuntu)pip
(Python)cpan
(Perl)cran
(\R)gem
(Ruby)
The Python module cnvkit
can be installed via pip
, but the software also requires an R module called DNAcopy
that must be installed using Bioconductor, which must first be installed using cran
. This means you'll have to manually install the DNAcopy
module when the app starts.
To add these runtime dependencies, use a text editor to update the runSpec and add the following execDepends section that will install the Python cnvkit
and R BiocManager
modules before the app is executed:
"runSpec": {
"interpreter": "bash",
"file": "src/cnvkit_bash.sh",
"distribution": "Ubuntu",
"release": "20.04",
"version": "0",
"execDepends": [
{
"name": "cnvkit",
"package_manager": "pip"
},
{
"name": "BiocManager",
"package_manager": "cran"
}
],
"timeoutPolicy": {
"*": {
"hours": 48
}
}
}
Specify File Patterns for Inputs
In the inputSpec, change the patterns to match the expected file extensions:
bam_files: *.bam
reference: *.cnn
Your dxapp.json should now look like the following:
{
"name": "cnvkit_bash",
"title": "cnvkit_bash",
"summary": "cnvkit_bash",
"dxapi": "1.0.0",
"version": "0.0.1",
"inputSpec": [
{
"name": "bam_tumor",
"label": "BAM Tumor Files",
"class": "array:file",
"optional": false,
"patterns": [
"*.bam"
],
"help": ""
},
{
"name": "reference",
"label": "Reference",
"class": "file",
"optional": false,
"patterns": [
"*.cnn"
],
"help": ""
}
],
"outputSpec": [
{
"name": "cns",
"label": "CNS",
class": "array:file",
"patterns": [
"*"
],
"help": ""
},
{
"name": "cns_filtered",
"label": "CNS Filtered",
"class": "array:file",
"patterns": [
"*"
],
"help": ""
},
{
"name": "plot",
"label": "Plot",
"class": "array:file",
"patterns": [
"*"
],
"help": ""
}
],
"runSpec": {
"timeoutPolicy": {
"*": {
"hours": 48
}
},
"execDepends": [
{
"name": "cnvkit",
"package_manager": "pip"
},
{
"name": "BiocManager",
"package_manager": "cran"
}
],
"interpreter": "bash",
"file": "src/cnvkit_bash.sh",
"distribution": "Ubuntu",
"release": "20.04",
"version": "0"
},
"access": {
"network": [
"*"
]
},
"regionalOptions": {
"aws:us-east-1": {
"systemRequirements": {
"*": {
"instanceType": "mem1_ssd1_v2_x4"
}
}
}
}
}
Edit the bash Code
The default bash code generated by the wizard starts with a generous header of comments that you may or may not wish to keep. The default code prints the values of the input variables, then downloads the input files individually. The app code belongs in the middle, after downloading the inputs and before uploading the outputs:
main() {
echo "Value of bam_tumor: '${bam_tumor[@]}'"
echo "Value of reference: '$reference'"
# The following line(s) use the dx command-line tool to download your file
# inputs to the local file system using variable names for the filenames. To
# recover the original filenames, you can use the output of "dx describe
# "$variable" --name".
dx download "$reference" -o reference
for i in ${!bam_tumor[@]}
do
dx download "${bam_tumor[$i]}" -o bam_tumor-$i
done
>>>>> Here is where the app code belongs <<<<<
# The following line(s) use the dx command-line tool to upload your file
# outputs after you have created them on the local file system. It assumes
# that you have used the output field name for the filename for each output,
# but you can change that behavior to suit your needs. Run "dx upload -h"
# to see more options to set metadata.
cns=$(dx upload cns --brief)
cns_filtered=$(dx upload cns_filtered --brief)
plot=$(dx upload plot --brief)
# The following line(s) use the utility dx-jobutil-add-output to format and
# add output variables to your job's output as appropriate for the output
# class. Run "dx-jobutil-add-output -h" for more information on what it
# does.
dx-jobutil-add-output cns "$cns" --class=file
dx-jobutil-add-output cns_filtered "$cns_filtered" --class=file
dx-jobutil-add-output plot "$plot" --class=file
}
Replace src/cnvkit_bash.sh
this with the following code:
#!/bin/bash
# Set pragmas to print commands and fail on errors
set -exuo pipefail
# Install required R module
Rscript -e "BiocManager::install('DNAcopy')"
# Verify the value of inputs
echo "Value of bam_tumor: '${bam_tumor[@]}'"
echo "Value of reference: '$reference'"
# Place all inputs into the "in" directory
dx-download-all-inputs --parallel
# Use "_path" versions of inputs for file paths
cnvkit.py batch \
${bam_tumor_path[@]} \
-r ${reference_path} \
-p $(expr $(nproc) - 1) \
-d cnvkit-out/ \
--scatter
# Make out directories for each output spec
mkdir -p ~/out/cns/ ~/out/cns_filtered/ ~/out/plot/
# Move CNVkit outputs to the "out" directory for upload
mv cnvkit-out/*.call.cns ~/out/cns_filtered/
mv cnvkit-out/*.cns ~/out/cns/
mv cnvkit-out/*-scatter.png ~/out/plot/
# Upload and annotate all output files
dx-upload-all-outputs --parallel
Rather than downloading the inputs individually as in the original template, this version downloads the all inputs in parallel with the following command:
dx-download-all-inputs --parallel
This will create an in directory with subdirectories named according to the input names. Note that bam_files input is an array of files, so this directory will contain numbered subdirectories starting at 0 for each input file:
in/bam_files/0/...
in/bam_files/1/...
in/reference/...
Similarly, the preceding code uses dx-upload-all-outputs
, which expects an out directory with subdirectories named according to each of the output specifications.
Build the Applet
Use dx pwd
to ensure you are in the correct project and dx select
to change projects, if necessary. If you are inside the bash source directory where the dxapp.json file exists, you can run dx build -f
If you are in the parent directory, run dx build -f cnvkit_bash
. Here is a sample output from successfully compiling the app:
$ dx build -f
{"id": "applet-GFyV3kj0VGFkV8k04f3K11QY"}
The -f|--overwrite
flag indicates you wish to overwrite any previous version of the applet. You may also want to use the -a|--archive
flag to move any previous versions to an archived location. You won't need either of these flags the first time you compile, but subsequent builds will require that you indicate how to handle previous versions of the applet. Run dx build --help
to learn more about build options.
Run the bash applet
Download this BAM file and add it to the inputs directory
Indicate an output directory, click the Run button, and then click the "View Log" to watch the job's progress.
You can also run the applet on the command line with the -h|--help
flag to verify the inputs and outputs:
$ dx run applet-GFyV3kj0VGFkV8k04f3K11QY -h
usage: dx run applet-GFyV2G8054JBQXY64g4F7ZKk [-iINPUT_NAME=VALUE ...]
Applet: cnvkit_bash
cnvkit_bash
Inputs:
BAM Tumor Files: -ibam_tumor=(file) [-ibam_tumor=... [...]]
Reference: -ireference=(file)
Outputs:
CNS: cns (array:file)
CNS Filtered: cns_filtered (array:file)
Plot: plot (array:file)
Select the input files on the web interface to note the file IDs that can be used to execute the app from the command line as follows:
$ dx run -y --watch applet-GFyV3kj0VGFkV8k04f3K11QY \
-ibam_tumor=file-GFxXjV006kZVQPb20G85VXBp \
-ireference=file-GFxXvpj06kZfP0QVKq2p2FGF \
--destination /outputs
You should see output from the preceding command that includes a JSON document with the inputs:
Using input JSON:
{
"bam_tumor": [
{
"$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
}
],
"reference": {
"$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
}
}
Note that you can place this JSON into a file and launch the applet with the inputs specified with the -f|--input-json-file
option, as follows. Use dx run -h
to learn about other command-line options:
$ dx run -y --watch applet-GFyV3kj0VGFkV8k04f3K11QY \
-f cnvkit_bash/inputs.json \
--destination /outputs
Note the job ID from dx run
, and use dx watch
to watch the job to completion and dx describe
to view the job's metadata. Alternatively, you can use the web platform to launch the job, using the file selector to specify each of the inputs, and then use the "Monitor" view to check the job's status, and view the output reference file when job completes.
Build an Asset
You'll notice the applet takes quite a while to run (around 14 minutes for me) because of the module installations. You can build an asset for these installations and use this in dxapp.json. Create a directory called cnvkit_asset with the following file dxasset.json:
{
"name": "cnvkit_asset",
"title": "cnvkit_asset",
"description": "cnvkit_asset",
"version": "0.0.1",
"distribution": "Ubuntu",
"release": "20.04",
"execDepends": [
{
"name": "cnvkit",
"package_manager": "pip"
},
{
"name": "BiocManager",
"package_manager": "cran"
}
]
}
Also create a Makefile with the following contents:
SHELL=/bin/bash -exuo pipefail
all:
sudo Rscript -e "BiocManager::install('DNAcopy')"
Run dx build_asset
to create the asset. This will launch a job that will report the asset ID at the end:
Asset bundle 'record-GFyVY000X1ZK3yGg4qv32GXv' is built and can now be used
in your app/applet's dxapp.json
Update the runSpec in dxapp.json to the following:
"runSpec": {
"timeoutPolicy": {
"*": {
"hours": 48
}
},
"assetDepends": [{"id": "record-GFyVY000X1ZK3yGg4qv32GXv"}],
"interpreter": "bash",
"file": "src/cnvkit_bash.sh",
"distribution": "Ubuntu",
"release": "20.04",
"version": "0"
},
Use dx build -f
and note the new app's ID. Create a JSON input as follows:
$ cat inputs.json
{
"bam_tumor": [
{
"$dnanexus_link": "file-GFxXjV006kZVQPb20G85VXBp"
}
],
"reference": {
"$dnanexus_link": "file-GFxXvpj06kZfP0QVKq2p2FGF"
}
}
Launch the new app from the CLI with the following command:
$ dx run applet-GFyVppQ0VGFxvvx44j43YyPz -f inputs.json -y
Use dx watch
with the new job ID to see how the run now uses the asset to run faster. I see about a 10-minute difference with the asset.
Review
You learned more ways to include app dependencies using package managers and a Makefile as well as by building an asset. The first strategy happens at runtime while the latter builds all the dependencies before the applet is run, making the runtime much faster.
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?