Example 5: samtools with a Docker Image
Last updated
Was this helpful?
Last updated
Was this helpful?
This tutorial uses the same samtools applet from but will be using a public Docker Image instead of an asset.
Please start the Cloud Workstation Application by typing in the following command into the terminal:
Once the Cloud Workstation Application has started, pull the image from the repository, save the Docker image within the Workstation, and then use dx upload to put the saved image onto the project space.
First, pull the Docker Image using the following command:
The path will include the tag from the Docker Repository.
Use up to date Docker Images from reliable sources
Next, save the Docker Image:
-o : the output. The file needs to be with the .tar.gz ending
The image will be referenced with the path, including tags
Finally, upload the saved image to the project:
Add –path project-ID:/ to dx upload command to ensure that it is being added to the Cloud Workspace Container.
When finished uploading, utilize Cloud Workstation to use the Docker image using:
or terminate the Cloud Workstation job, and then proceed to building the applet.
We will use dx-app-wizard to create a skeleton applet structure with these files:
First, give the applet a name. The prompt shows that only letters, numbers, a dot, underscore, and a dash can be used. As stated earlier, this applet name will also be the name of the directory. Use samtools_count_docker_bundle:
Next is the title. Note that the prompt includes empty square brackets ([]), which contain the default value if Enter is pressed. As title is not required, it contains the empty string, but add an informational title “Samtools Count”
Likewise, the summary field is not required:
The version is also optional, and press Enter to take the default:
There is one input for this applet, which is a BAM file.
Use the parameters for the input section:
name: bam
label: BAM file
class: file
optional: false
When prompted for the first input, enter the following:
The name of the input will be used as a variable in the bash code, so use only letters, numbers, and underscores as in bam or bam_file.
The label is optional, as noted by the empty square brackets.
The types include primitives like integers, floating-point numbers, and strings, as well as arrays of primitive types.
This is a required input. If an input is optional, provide a default value.
When prompted for the second input, press Enter:
There is one output for this applet, which is a counts file.
Use the parameters for the output section:
name: counts
label: counts file
class: file
When prompted for the first output name, enter the following:
This name will also become a bash variable, so best practice is to use letters, numbers, and underscores.
The label is optional.
The class must be from the preceding list. To be reminded of the choices, press the Tab key twice.
When prompted for the second output, press Enter:
Here are the final settings to complete the wizard:
Timeout Policy: 48h
Programming language: bash
Access to internet: No (default)
Access to parent project: No (default)
Instance Type: mem1_ssd1_v2_x4 (default)
Applets are required to set a maximum time for running to prevent a job from running an excessively long time. While some applets may legitimately need days to run, most probably need something in the range of 12-48 hours. As noted in the prompt, use m, h, or d to specify minutes, hours, or days, respectively:
For the template language, select from bash or Python for the program that is executed when the applet starts. The applet code can execute any program available in the execution environment, including custom programs written in any language. Choose bash:
Next, determine if the applet has access to the internet and/or the parent project. Unless the applet specifically needs access, such as to download a file at runtime, it's best to answer no:
The user is always free to override the instance type using the --instance-type option to dx run.
The final output from dx-app-wizard is a summary of the files that are created:
Readme.developer.md : This file should contain applet implementation details.
Readme.md: This file should contain user help.
dxapp.json: The answers from dx-app-wizard are used to create the app metadata.
resources/ : The resources directory is for any additional files you want available on the runtime instance.
src/ : The src (pronounced "source") is a conventional place for source code, but it's not a requirement that code lives in this directory.
src/samtools_count.sh : This is the bash script that will be executed when the applet is run.
test/ The test directory is empty and will not be discussed in this section.
The contents of the resources directory will be placed into the root directory of the runtime instance. For instance, if there is a file resources/my_tool, then it will be available on the runtime instance as /my_tool. For the sh code, reference the full path (/my_tool) or expand the $PATH variable to include /. Best practice is to create the directory structure resources/usr/local/bin/, and then the file will be at /usr/local/bin/my_tool as /usr/local/bin normally part of $PATH.
Dxapp.json
This is where the formatting from the dx-app-wizard is listed in a .json file. If needed, change the settings for the output, input, version, etc within the json file.
The first section is the metadata, as shown below:
The next section(s) are Inputs and Outputs, shown below:
Finally, the last section is the Additional Settings, shown below:
Adding A Docker Image into the Resources Folder
Add your Docker Image to the resources folder.
dx download the samtools.tar.gz
mv samtools.tar.gz to the samtools_count_docker_bundle/resources/ folder
Samtools_docker.sh
Update the following .sh code file for this applet:
#!/bin/bash is the “shebang” command to show that it is a bash script
set -exuo pipefail is the pragma to show each command as it is executed and to halt on undefined variables or failed system calls
Within the “main” section, there are code lines that:
Echo the value of the input, “bam”, using the name $bam, which is part of the input Spec
Download the input file onto the job instance, with the output being the name of the bam file (ex: ___.bam)
The first Docker command, which loads the saved Docker image, samtools.tar.gz (which is in the resources folder)
Assigning a counts_id variable for the name of the counts file output for samtools
The second Docker Command
Docker run to run the Docker Image
-v /home/dnanexus:/home/dnanexus to mount the volume
The name of the Docker Image, including the tag.
The samtools command that is being run in the applet, including the location of the output file as /home/dnanexus/${counts_id}
Assigning a variable (upload) for uploading the counts file back to the project
Using the upload variable AND the output spec in the json file for the dx-jobutil-add-output command
Once you have added the Docker Image to the resources folder and edited the .sh and .json files, use the following command to create your applet in the project of your choice:
Then, proceed to test your applet!
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Lastly, I must specify a default instance type. The prompt includes an abbreviated list of. The final number indicates the number of cores, e.g., _x4 indicates 4 cores. The greater the number of cores, the more available memory and disk space. In this case, a small 4-core instance is sufficient: