All pages
Powered by GitBook
1 of 1

Loading...

Example 5: workflow

In this example, you will learn:

  • How to to accept a BAM file as a workflow input

  • Break the BAM into slices by chromosome

  • Distribute the slices in parallel to count the number of alignments in each

Getting Started

To begin, create a new directory called view_and_count and a workflow.wdl file.

Here is the workflow defintion you should add:

  • The name of this workflow is bam_chrom_counter.

  • The workflow accepts a single, required File input that will be called bam as it is expected to be a BAM file.

  • Use a to define a String value of the Docker file containing Samtools.

Following is the slice_bam task that uses to index the input BAM file and break it into separate files for each of the 22 human chromosomes:

  • The inputs to this task are the BAM file and the name of the Docker image.

  • The command block uses triple-angle brackets because it must use the dollar sign ($) in shell code.

  • Use on the input BAM file for fast random access to the alignments.

  • The $()

The count_bam task is written to handle just one BAM slice:

  • This BAM input will be a slice of alignments for a given region. Naming this bam does not interfere with the bam variable in the workflow or any other task.

  • Use the command with -c|--count to count the number of alignments in the given file.

  • The output of this task uses the function to read the STDOUT from the command as an integer value.

At this point, I like to use miniwdl to check the syntax:

As no errors are reported, I will compile this onto the DNAnexus platform:

Finally, I will run this workflow using a sample BAM file:

Return to the DNAnexus website to monitor the progress of the analysis.

Placing Task Definitions in Files

As the number of tasks increase, workflow definitions can get quite long. You can shorten the workflow.wdl by placing each task in a separate file, which also makes it easier to reuse a task in a separate workflow. To do this, create a subdirectory called tasks, and then create a file called tasks/slice_bam.wdl with the following contents:

Also create the file tasks/count_bam.wdl with the following contents:

Both of the preceding tasks are identical to the original definitions, but note that the files include a version that matches the version of the workflow. Change workflow.wdl as follows:

  • Use to include WDL code from a file or URI. Note the use of the as clause to alias the imports using a different name.

  • Call task_slice_bam.slice_bam from the imported file using as to give it the same name as in the original workflow.

  • Do the same with task_count_bam.count_bam.

Use miniwdl to check your syntax, then use dxCompiler to create an app.

Review

In this lesson, you learned how to:

  • Accept a file as a workflow input

  • Define a non-input declaration

  • Use scatter to run tasks in parallel

  • Use the output from one task as the input to another task

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

The first call will be to the slice_bam task that will break the BAM into one file per chromosome. The input for this task is the workflow's BAM file.

  • The scatter directive in WDL causes the actions in the block to be executed in parallel, which can lead to significant performance gains. Here, the each slice file returned from the slice_bam task will be used as the input to the count_bam task.

  • The workflow defines two outputs: a BAM index file and an array of integer values representing the number of alignments in each of the BAM slices.

  • syntax in bash calls the
    seq
    function to create a sequence of integer values up the 22 human non-sex chromosomes.
  • The samtools view will display the alignments in BAM format for a region like "chr1" and place the output into the file slices/1.bam. Note the mix of ~ for WDL variables and $ for bash variables.

  • The runtime block allows you to define a Docker image that contains an installation of Samtools.

  • The output of this task is the BAM index, which is the given BAM file plus the suffix .bai, and the sliced alignment files.

  • The slices will be one or more files as indicated by Array[File], and they will be found using the glob function to look in the slices directory for all files with the extension .bam.

  • Mix ~ and $ in command blocks to dereference WDL and shell variables

  • Import WDL from external sources such as local files or remote URIs.

  • non-input declaration
    Samtools
    samtools index
    samtools view
    read_int
    import
    Full Documentation
    version 1.0
    
    workflow bam_chrom_counter { 
        input {
            File bam 
        }
    
        String docker_img = "quay.io/biocontainers/samtools:1.12--hd5e65b6_0" 
    
        call slice_bam {
            input : bam = bam, 
                    docker_img = docker_img
        }
    
        scatter (slice in slice_bam.slices) { 
            call count_bam {
                input: bam = slice,
                       docker_img = docker_img
            }
        }
    
        output { 
            File bai = slice_bam.bai
            Array[Int] count = count_bam.count
        }
    }
    task slice_bam {
        input { 
            File bam
            String docker_img
        }
    
        command <<< 
        set -ex
        samtools index "~{bam}" 
        mkdir slices
    
        for i in $(seq 22); do 
            samtools view -b -o "slices/$i.bam" "~{bam}" "chr${i}" 
        done
        >>>
    
        runtime { 
            docker: docker_img
        }
    
        output { 
            File bai = "~{bam}.bai"
            Array[File] slices = glob("slices/*.bam") 
        }
    }
    task count_bam {
        input {
            File bam 
            String docker_img
        }
    
        command <<<
            samtools view -c "~{bam}" 
        >>>
    
        runtime {
            docker: docker_img
        }
    
        output {
            Int count = read_int(stdout()) 
        }
    }
    $ miniwdl check workflow.wdl
    workflow.wdl
        workflow bam_chrom_counter
            call slice_bam
            scatter slice
                call count_bam
        task count_bam
        task slice_bam
    $ java -jar ~/dxCompiler-2.10.2.jar compile workflow.wdl \
            -archive \
            -folder /workflows \
            -project project-GFPQvY007GyyXgXGP7x9zbGb
    workflow-GFqF27j07GyZ33JX4vzqgK32
    $ dx run workflow-GFqF27j07GyZ33JX4vzqgK32 \
    > -istage-common.bam=file-G8V38KQ0zQ713kZGF6xQQvjJ -y
    
    Using input JSON:
    {
        "stage-common.bam": {
            "$dnanexus_link": "file-G8V38KQ0zQ713kZGF6xQQvjJ"
        }
    }
    
    Calling workflow-GFqF27j07GyZ33JX4vzqgK32 with output destination
      project-GFPQvY007GyyXgXGP7x9zbGb:/
    
    Analysis ID: analysis-GFqF7Zj07GyZQ957Jy822gQY
    version 1.0
    
    task slice_bam {
        input {
            File bam
            String docker_img
        }
    
        command <<<
        set -ex
        samtools index "~{bam}"
        mkdir slices
    
        for i in $(seq 22); do
            samtools view -b -o "slices/$i.bam" "~{bam}" "chr${i}"
        done
        >>>
    
        runtime {
            docker: docker_img
        }
    
        output {
            File bai = "~{bam}.bai"
            Array[File] slices = glob("slices/*.bam")
        }
    }
    version 1.0
    
    task count_bam {
        input {
            File bam
            String docker_img
        }
    
        command <<<
            samtools view -c "~{bam}"
        >>>
    
        runtime {
            docker: docker_img
        }
    
        output {
            Int count = read_int(stdout())
        }
    }
    version 1.0
    
    import "./tasks/slice_bam.wdl" as task_slice_bam 
    import "./tasks/count_bam.wdl" as task_count_bam
    
    workflow bam_chrom_counter {
        input {
            File bam
        }
    
        String docker_img = "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
    
        call task_slice_bam.slice_bam as slice_bam { 
            input : bam = bam,
                    docker_img = docker_img
        }
    
        scatter (slice in slice_bam.slices) {
            call task_count_bam.count_bam as count_bam { 
                input: bam = slice,
                       docker_img = docker_img
            }
        }
    
        output {
            File bai = slice_bam.bai
            Array[Int] count = count_bam.count
        }
    }