# Example 5: workflow

In this example, you will learn:

* How to to accept a BAM file as a workflow input
* Break the BAM into slices by chromosome
* Distribute the slices in parallel to count the number of alignments in each

## Getting Started

To begin, create a new directory called *view\_and\_count* and a *workflow\.wdl* file.

Here is the `workflow` defintion you should add:

```
version 1.0

workflow bam_chrom_counter { 
    input {
        File bam 
    }

    String docker_img = "quay.io/biocontainers/samtools:1.12--hd5e65b6_0" 

    call slice_bam {
        input : bam = bam, 
                docker_img = docker_img
    }

    scatter (slice in slice_bam.slices) { 
        call count_bam {
            input: bam = slice,
                   docker_img = docker_img
        }
    }

    output { 
        File bai = slice_bam.bai
        Array[Int] count = count_bam.count
    }
}
```

* The name of this workflow is *bam\_chrom\_counter*.
* The workflow accepts a single, required `File` input that will be called `bam` as it is expected to be a BAM file.
* Use a [non-input declaration](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#non-input-declarations) to define a `String` value of the Docker file containing Samtools.
* The first `call` will be to the `slice_bam` task that will break the BAM into one file per chromosome. The input for this task is the workflow's BAM file.
* The [`scatter`](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#scatter) directive in WDL causes the actions in the block to be executed in parallel, which can lead to significant performance gains. Here, the each `slice` file returned from the `slice_bam` task will be used as the input to the `count_bam` task.
* The workflow defines two outputs: a BAM index file and an array of integer values representing the number of alignments in each of the BAM slices.

Following is the `slice_bam` task that uses [Samtools](http://www.htslib.org/) to index the input BAM file and break it into separate files for each of the 22 human chromosomes:

```
task slice_bam {
    input { 
        File bam
        String docker_img
    }

    command <<< 
    set -ex
    samtools index "~{bam}" 
    mkdir slices

    for i in $(seq 22); do 
        samtools view -b -o "slices/$i.bam" "~{bam}" "chr${i}" 
    done
    >>>

    runtime { 
        docker: docker_img
    }

    output { 
        File bai = "~{bam}.bai"
        Array[File] slices = glob("slices/*.bam") 
    }
}
```

* The inputs to this task are the BAM file and the name of the Docker image.
* The command block uses triple-angle brackets because it must use the dollar sign (`$`) in shell code.
* Use [`samtools index`](http://www.htslib.org/doc/samtools-index.html) on the input BAM file for fast random access to the alignments.
* The `$()` syntax in bash calls the `seq` function to create a sequence of integer values up the 22 human non-sex chromosomes.
* The [`samtools view`](http://www.htslib.org/doc/samtools-view.html) will display the alignments in BAM format for a region like "chr1" and place the output into the file *slices/1.bam*. Note the mix of `~` for WDL variables and `$` for bash variables.
* The [`runtime`](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#runtime-section) block allows you to define a Docker image that contains an installation of Samtools.
* The output of this task is the BAM index, which is the given BAM file plus the suffix *.bai*, and the sliced alignment files.
* The `slices` will be one or more files as indicated by `Array[File]`, and they will be found using the [`glob`](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#globs) function to look in the *slices* directory for all files with the extension *.bam*.

The `count_bam` task is written to handle just one BAM slice:

```
task count_bam {
    input {
        File bam 
        String docker_img
    }

    command <<<
        samtools view -c "~{bam}" 
    >>>

    runtime {
        docker: docker_img
    }

    output {
        Int count = read_int(stdout()) 
    }
}
```

* This BAM input will be a slice of alignments for a given region. Naming this `bam` does not interfere with the `bam` variable in the workflow or any other task.
* Use the [`samtools view`](http://www.htslib.org/doc/samtools-view.html) command with `-c|--count` to count the number of alignments in the given file.
* The output of this task uses the [`read_int`](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#int-read_intstringfile) function to read the `STDOUT` from the command as an integer value.

At this point, I like to use `miniwdl` to check the syntax:

```
$ miniwdl check workflow.wdl
workflow.wdl
    workflow bam_chrom_counter
        call slice_bam
        scatter slice
            call count_bam
    task count_bam
    task slice_bam
```

As no errors are reported, I will compile this onto the DNAnexus platform:

```
$ java -jar ~/dxCompiler-2.10.2.jar compile workflow.wdl \
        -archive \
        -folder /workflows \
        -project project-GFPQvY007GyyXgXGP7x9zbGb
workflow-GFqF27j07GyZ33JX4vzqgK32
```

Finally, I will run this workflow using a sample BAM file:

```
$ dx run workflow-GFqF27j07GyZ33JX4vzqgK32 \
> -istage-common.bam=file-G8V38KQ0zQ713kZGF6xQQvjJ -y

Using input JSON:
{
    "stage-common.bam": {
        "$dnanexus_link": "file-G8V38KQ0zQ713kZGF6xQQvjJ"
    }
}

Calling workflow-GFqF27j07GyZ33JX4vzqgK32 with output destination
  project-GFPQvY007GyyXgXGP7x9zbGb:/

Analysis ID: analysis-GFqF7Zj07GyZQ957Jy822gQY
```

Return to the DNAnexus website to monitor the progress of the analysis.

## Placing Task Definitions in Files

As the number of tasks increase, workflow definitions can get quite long. You can shorten the *workflow\.wdl* by placing each task in a separate file, which also makes it easier to reuse a task in a separate workflow. To do this, create a subdirectory called *tasks*, and then create a file called *tasks/slice\_bam.wdl* with the following contents:

```
version 1.0

task slice_bam {
    input {
        File bam
        String docker_img
    }

    command <<<
    set -ex
    samtools index "~{bam}"
    mkdir slices

    for i in $(seq 22); do
        samtools view -b -o "slices/$i.bam" "~{bam}" "chr${i}"
    done
    >>>

    runtime {
        docker: docker_img
    }

    output {
        File bai = "~{bam}.bai"
        Array[File] slices = glob("slices/*.bam")
    }
}
```

Also create the file *tasks/count\_bam.wdl* with the following contents:

```
version 1.0

task count_bam {
    input {
        File bam
        String docker_img
    }

    command <<<
        samtools view -c "~{bam}"
    >>>

    runtime {
        docker: docker_img
    }

    output {
        Int count = read_int(stdout())
    }
}
```

Both of the preceding tasks are identical to the original definitions, but note that the files include a `version` that matches the version of the workflow. Change *workflow\.wdl* as follows:

```
version 1.0

import "./tasks/slice_bam.wdl" as task_slice_bam 
import "./tasks/count_bam.wdl" as task_count_bam

workflow bam_chrom_counter {
    input {
        File bam
    }

    String docker_img = "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"

    call task_slice_bam.slice_bam as slice_bam { 
        input : bam = bam,
                docker_img = docker_img
    }

    scatter (slice in slice_bam.slices) {
        call task_count_bam.count_bam as count_bam { 
            input: bam = slice,
                   docker_img = docker_img
        }
    }

    output {
        File bai = slice_bam.bai
        Array[Int] count = count_bam.count
    }
}
```

* Use [`import`](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#import-statements) to include WDL code from a file or URI. Note the use of the `as` clause to alias the imports using a different name.
* Call `task_slice_bam.slice_bam` from the imported file using `as` to give it the same name as in the original workflow.
* Do the same with `task_count_bam.count_bam`.

Use `miniwdl` to check your syntax, then use dxCompiler to create an app.

## Review

In this lesson, you learned how to:

* Accept a file as a workflow input
* Define a non-input declaration
* Use `scatter` to run tasks in parallel
* Use the output from one task as the input to another task
* Mix `~` and `$` in command blocks to dereference WDL and shell variables
* Import WDL from external sources such as local files or remote URIs.

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.dnanexus.com/buildingworkflows/wdl/wdl_view_and_count.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
