Example 4: cnvkit
Last updated
Was this helpful?
Last updated
Was this helpful?
To begin, you'll create a bash app to run , which will find "genome-wide copy number from high-throughput sequencing." Create a local directory to hold your work, and consider putting the contents into a source code repository like Git.
In this example, you will:
Use various package managers to install dependencies
Build an asset
Learn to use dx-download-all-inputs
and dx-upload-all-outputs
From the web interface, select "Projects → All Projects" to see your project list. Click the "New Project" button to create a new project called "CNVkit." Alternatively, use dx new project
to do this from the command line. However you choose to create a project, be sure this has been selected by running dx pwd
to check your current working directory and using dx select
to select the project, if needed.
Inside your working directory, run the command dx-app-wizard cnvkit_bash
to launch the . Optionally provide a title, summary, and version at the prompts.
The app will accept two inputs:
One or more BAM files of the tumor samples: Give this input the name bam_tumor with the label "BAM Tumor Files." For the class, choose array:file, and indicate that this is not an optional parameter.
A reference file: Give this input the name reference with the label "Reference." For the class, choose file, and indicate that this is not an optional parameter.
When prompted for the third input, press Enter to end the inputs.
Define three outputs, each of the type array:file with the following names and whatever labels you feel are appropriate:
cns
cns_filtered
plot
Press Enter when prompted for the fourth output to indicate you are finished.
Press Enter to accept the default value for the timeout policy.
Type bash for the programming language.
Type y to indicate that the app will need internet access.
Type n to indicate that the app will need access to the parent project.
Press Enter to accept the default value for the instance type or select one from the list shown.
You should see a message saying the app's template was created in a directory name matching the app's name. For instance, I have the following:
This is a JSON file containing metadata that will be used to create the app on the DNAnexus platform.
A stub for user documentation.
A stub for developer documentation.
A template bash script for the app's functionality.
The dxapp.json file that was created by the wizard should look like the following:
CNVkit has dependencies on both Python and R modules that must be installed before running. In the dxapp.json
, you can specify dependencies that can be installed with the following package managers:
apt
(Ubuntu)
pip
(Python)
cpan
(Perl)
cran
(\R)
gem
(Ruby)
To add these runtime dependencies, use a text editor to update the runSpec and add the following execDepends section that will install the Python cnvkit
and R BiocManager
modules before the app is executed:
In the inputSpec, change the patterns to match the expected file extensions:
bam_files: *.bam
reference: *.cnn
Your dxapp.json should now look like the following:
The default bash code generated by the wizard starts with a generous header of comments that you may or may not wish to keep. The default code prints the values of the input variables, then downloads the input files individually. The app code belongs in the middle, after downloading the inputs and before uploading the outputs:
Replace src/cnvkit_bash.sh
this with the following code:
Rather than downloading the inputs individually as in the original template, this version downloads the all inputs in parallel with the following command:
This will create an in directory with subdirectories named according to the input names. Note that bam_files input is an array of files, so this directory will contain numbered subdirectories starting at 0 for each input file:
Similarly, the preceding code uses dx-upload-all-outputs
, which expects an out directory with subdirectories named according to each of the output specifications.
Use dx pwd
to ensure you are in the correct project and dx select
to change projects, if necessary. If you are inside the bash source directory where the dxapp.json file exists, you can run dx build -f
If you are in the parent directory, run dx build -f cnvkit_bash
. Here is a sample output from successfully compiling the app:
The -f|--overwrite
flag indicates you wish to overwrite any previous version of the applet. You may also want to use the -a|--archive
flag to move any previous versions to an archived location. You won't need either of these flags the first time you compile, but subsequent builds will require that you indicate how to handle previous versions of the applet. Run dx build --help
to learn more about build options.
Download this BAM file and add it to the inputs directory
Indicate an output directory, click the Run button, and then click the "View Log" to watch the job's progress.
You can also run the applet on the command line with the -h|--help
flag to verify the inputs and outputs:
Select the input files on the web interface to note the file IDs that can be used to execute the app from the command line as follows:
You should see output from the preceding command that includes a JSON document with the inputs:
Note that you can place this JSON into a file and launch the applet with the inputs specified with the -f|--input-json-file
option, as follows. Use dx run -h
to learn about other command-line options:
Note the job ID from dx run
, and use dx watch
to watch the job to completion and dx describe
to view the job's metadata. Alternatively, you can use the web platform to launch the job, using the file selector to specify each of the inputs, and then use the "Monitor" view to check the job's status, and view the output reference file when job completes.
You'll notice the applet takes quite a while to run (around 14 minutes for me) because of the module installations. You can build an asset for these installations and use this in dxapp.json. Create a directory called cnvkit_asset with the following file dxasset.json:
Also create a Makefile with the following contents:
Run dx build_asset
to create the asset. This will launch a job that will report the asset ID at the end:
Update the runSpec in dxapp.json to the following:
Use dx build -f
and note the new app's ID. Create a JSON input as follows:
Launch the new app from the CLI with the following command:
Use dx watch
with the new job ID to see how the run now uses the asset to run faster. I see about a 10-minute difference with the asset.
You learned more ways to include app dependencies using package managers and a Makefile as well as by building an asset. The first strategy happens at runtime while the latter builds all the dependencies before the applet is run, making the runtime much faster.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
See the for a more complete understanding of all the possible fields and their implications.
The Python module cnvkit
can be installed via pip
, but the software also requires an R module called DNAcopy
that must be installed using , which must first be installed using cran
. This means you'll have to manually install the DNAcopy
module when the app starts.