Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Step 1: Download the Docker Image
  • Step 2: Building the Applet
  • Resources

Was this helpful?

Export as PDF
  1. Building Applets
  2. Bash

Example 5: samtools with a Docker Image

PreviousExample 4: cnvkitNextPython

Last updated 8 days ago

Was this helpful?

This tutorial uses the same samtools applet from but will be using a public Docker Image instead of an asset.

Step 1: Download the Docker Image

Please start the Cloud Workstation Application by typing in the following command into the terminal:

 dx run app-cloud_workstation --instance-type mem1_ssd2_v2_x72 --ssh -y

Once the Cloud Workstation Application has started, pull the image from the repository, save the Docker image within the Workstation, and then use dx upload to put the saved image onto the project space.

First, pull the Docker Image using the following command:

docker pull biocontainers/samtools:v1.9-4-deb_cv1
  • The path will include the tag from the Docker Repository.

  • Use up to date Docker Images from reliable sources

Next, save the Docker Image:

docker save -o samtools.tar.gz biocontainers/samtools:v1.9-4-deb_cv1
  • -o : the output. The file needs to be with the .tar.gz ending

  • The image will be referenced with the path, including tags

Finally, upload the saved image to the project:

dx upload samtools.tar.gz --path project-ID:/
  • Add –path project-ID:/ to dx upload command to ensure that it is being added to the Cloud Workspace Container.

When finished uploading, utilize Cloud Workstation to use the Docker image using:

docker run -it biocontainers/samtools:v1.9-4-deb_cv1

or terminate the Cloud Workstation job, and then proceed to building the applet.

Step 2: Building the Applet

We will use dx-app-wizard to create a skeleton applet structure with these files:

dx-app-wizard
DNAnexus App Wizard, API v1.0.0
Basic Metadata
Please enter basic metadata fields that will be used to describe your app. Optional fields are denoted by options with square brackets. At the end of this wizard, the files necessary for building your app will be generated from the answers you provide.

Metadata:

First, give the applet a name. The prompt shows that only letters, numbers, a dot, underscore, and a dash can be used. As stated earlier, this applet name will also be the name of the directory. Use samtools_count_docker_bundle:

The name of your app must be unique on the DNAnexus platform.  After creating your app for the first time, you will be able to publish new versions using the same app name.  App names are restricted to alphanumeric characters (a-z, A-Z, 0-9), and the characters ".", "_", and "-".

App Name: samtools_count_docker_bundle

Next is the title. Note that the prompt includes empty square brackets ([]), which contain the default value if Enter is pressed. As title is not required, it contains the empty string, but add an informational title “Samtools Count”

The title, if provided, is what is shown as the name of your app on the website.  It can be any valid UTF-8 string.
Title []: Samtools Count

Likewise, the summary field is not required:

The summary of your app is a short phrase or one-line description of what your app does.  It can be any UTF-8 human-readable string.
Summary []: Count SAM/BAM alignments

The version is also optional, and press Enter to take the default:

You can publish multiple versions of your app, and the version of your app is a string with which to tag a particular version.  We encourage the use of Semantic Versioning for labeling your apps (see http://semver.org/ for more details).

Version [0.0.1]:

Input Specification:

There is one input for this applet, which is a BAM file.

Use the parameters for the input section:

  • name: bam

  • label: BAM file

  • class: file

  • optional: false

When prompted for the first input, enter the following:

Input Specification
You will now be prompted for each input parameter to your app. Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number.

1st input name (<ENTER> to finish): bam 

Label (optional human-readable name) []: BAM File 

Your input parameter must be of one of the following classes: 
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file

This is an optional parameter [y/n]: n
  • The name of the input will be used as a variable in the bash code, so use only letters, numbers, and underscores as in bam or bam_file.

  • The label is optional, as noted by the empty square brackets.

  • The types include primitives like integers, floating-point numbers, and strings, as well as arrays of primitive types.

  • This is a required input. If an input is optional, provide a default value.

When prompted for the second input, press Enter:

2nd input name (<ENTER> to finish):

Output Specification:

There is one output for this applet, which is a counts file.

Use the parameters for the output section:

  • name: counts

  • label: counts file

  • class: file

When prompted for the first output name, enter the following:

Output Specification
You will now be prompted for each output parameter of your app.  Each parameter should have a unique name that uses only the underscore "_" and alphanumeric characters, and does not start with a number.

1st output name (<ENTER> to finish): counts 

Label (optional human-readable name) []: Counts File 

Choose a class (<TAB> twice for choices): file
  • This name will also become a bash variable, so best practice is to use letters, numbers, and underscores.

  • The label is optional.

  • The class must be from the preceding list. To be reminded of the choices, press the Tab key twice.

When prompted for the second output, press Enter:

2nd output name (<ENTER> to finish):

Additional Settings

Here are the final settings to complete the wizard:

  • Timeout Policy: 48h

  • Programming language: bash

  • Access to internet: No (default)

  • Access to parent project: No (default)

  • Instance Type: mem1_ssd1_v2_x4 (default)

Applets are required to set a maximum time for running to prevent a job from running an excessively long time. While some applets may legitimately need days to run, most probably need something in the range of 12-48 hours. As noted in the prompt, use m, h, or d to specify minutes, hours, or days, respectively:

Timeout Policy
Set a timeout policy for your app. Any single entry point of the app that runs longer than the specified timeout will fail with a TimeoutExceeded error. Enter an int greater than 0 with a single-letter suffix (m=minutes,h=hours, d=days) (e.g. "48h").
Timeout policy [48h]:

For the template language, select from bash or Python for the program that is executed when the applet starts. The applet code can execute any program available in the execution environment, including custom programs written in any language. Choose bash:

Template Options
You can write your app in any programming language, but we provide templates for the following supported languages: Python, bash
Programming language: bash

Next, determine if the applet has access to the internet and/or the parent project. Unless the applet specifically needs access, such as to download a file at runtime, it's best to answer no:

Access Permissions
If you request these extra permissions for your app, users will see this fact when launching your app, and certain other restrictions will apply. For more information, see https://documentation.dnanexus.com/developer/apps/app-permissions.

Access to the Internet (other than accessing the DNAnexus API).
Will this app need access to the Internet? [y/N]: n

Direct access to the parent project. This is not needed if your app specifies outputs,which will be copied into the project after it's done running.

Will this app need access to the parent project? [y/N]: n
Default instance type: The instance type you select here will apply to all entry points in your app unless you override it. See https://documentation.dnanexus.com/developer/api/running-analyses/instance-types for more information.

Choose an instance type for your app [mem1_ssd1_v2_x4]:

The user is always free to override the instance type using the --instance-type option to dx run.

Files From dx-app-wizard

The final output from dx-app-wizard is a summary of the files that are created:

*** Generating DNAnexus App Template... ***
Your app specification has been written to the dxapp.json file. You can specify more app options by editing this file directly (see https://documentation.dnanexus.com/developer for complete documentation).

Created files:
    samtools_count_docker_bundle/Readme.developer.md 
    samtools_count_docker_bundle/Readme.md 
    samtools_count_docker_bundle/dxapp.json  
    samtools_count_docker_bundle/resources/  
    samtools_count_docker_bundle/src/ 
    samtools_count_docker_bundle/src/samtools_count.sh 
    samtools_count_docker_bundle/test/  

App directory created!  See https://documentation.dnanexus.com/developer for tutorials on how to modify these files, or run "dx build samtools_count" or "dx build --create-app samtools_count_docker_bundle" while logged in with dx.
Running the DNAnexus build utility will create an executable on the DNAnexus platform.  Any files found in the resources directory will be uploaded so that they will be present in the root directory when the executable is run.
  1. Readme.developer.md : This file should contain applet implementation details.

  2. Readme.md: This file should contain user help.

  3. dxapp.json: The answers from dx-app-wizard are used to create the app metadata.

  4. resources/ : The resources directory is for any additional files you want available on the runtime instance.

  5. src/ : The src (pronounced "source") is a conventional place for source code, but it's not a requirement that code lives in this directory.

  6. src/samtools_count.sh : This is the bash script that will be executed when the applet is run.

  7. test/ The test directory is empty and will not be discussed in this section.

The contents of the resources directory will be placed into the root directory of the runtime instance. For instance, if there is a file resources/my_tool, then it will be available on the runtime instance as /my_tool. For the sh code, reference the full path (/my_tool) or expand the $PATH variable to include /. Best practice is to create the directory structure resources/usr/local/bin/, and then the file will be at /usr/local/bin/my_tool as /usr/local/bin normally part of $PATH.

Dxapp.json

This is where the formatting from the dx-app-wizard is listed in a .json file. If needed, change the settings for the output, input, version, etc within the json file.

The first section is the metadata, as shown below:

{
  "name": "samtools_count_docker_bundle",
  "title": "Samtools Count",
  "summary": " Count SAM/BAM alignments",
  "dxapi": "1.0.0",
  "version": "0.0.1",

The next section(s) are Inputs and Outputs, shown below:

"inputSpec": [
    {
      "name": "bam",
      "label": "BAM file",
      "class": "file",
      "optional": false,
      "patterns": [
        "*.bam"
      ],
      "help": ""
    }
  ],
  "outputSpec": [
    {
      "name": "counts",
      "label": "counts file",
      "class": "file",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],

Finally, the last section is the Additional Settings, shown below:

"runSpec": {
    "timeoutPolicy": {
      "*": {
        "hours": 3
      }
    },
    "interpreter": "bash",
    "file": "src/samtools_docker.sh",
    "distribution": "Ubuntu",
    "release": "24.04",
    "version": "0"
  },
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {
          "instanceType": "mem1_ssd1_v2_x4"
        }
      }
    }
  }
}

Adding A Docker Image into the Resources Folder

Add your Docker Image to the resources folder.

  1. dx download the samtools.tar.gz

  2. mv samtools.tar.gz to the samtools_count_docker_bundle/resources/ folder

Samtools_docker.sh

Update the following .sh code file for this applet:

#!/bin/bash

set -exuo pipefail

main() { 
    echo "Value of bam: '$bam'" 
    dx download "$bam" -o "$bam_name"
    docker load  < "/samtools.tar.gz"
    counts_id=${bam_prefix}.counts.txt
    docker run -v /home/dnanexus:/home/dnanexus \
        biocontainers/samtools:v1.9-4-deb_cv1 samtools view -c "/home/dnanexus/${bam_name}" > "/home/dnanexus/${counts_id}" 

   upload=$(dx upload "$counts_id" --brief) 
    dx-jobutil-add-output counts "$upload" --class=file 
}
  • #!/bin/bash is the “shebang” command to show that it is a bash script

  • set -exuo pipefail is the pragma to show each command as it is executed and to halt on undefined variables or failed system calls

  • Within the “main” section, there are code lines that:

    • Echo the value of the input, “bam”, using the name $bam, which is part of the input Spec

    • Download the input file onto the job instance, with the output being the name of the bam file (ex: ___.bam)

    • The first Docker command, which loads the saved Docker image, samtools.tar.gz (which is in the resources folder)

    • Assigning a counts_id variable for the name of the counts file output for samtools

    • The second Docker Command

      • Docker run to run the Docker Image

      • -v /home/dnanexus:/home/dnanexus to mount the volume

      • The name of the Docker Image, including the tag.

      • The samtools command that is being run in the applet, including the location of the output file as /home/dnanexus/${counts_id}

    • Assigning a variable (upload) for uploading the counts file back to the project

    • Using the upload variable AND the output spec in the json file for the dx-jobutil-add-output command

Building the Applet

Once you have added the Docker Image to the resources folder and edited the .sh and .json files, use the following command to create your applet in the project of your choice:

dx build samtools_count_docker_bundle

Then, proceed to test your applet!

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Lastly, I must specify a default instance type. The prompt includes an abbreviated list of. The final number indicates the number of cores, e.g., _x4 indicates 4 cores. The greater the number of cores, the more available memory and disk space. In this case, a small 4-core instance is sufficient:

Example 3: samtools
instance types
Full Documentation