Overview of Nextflow

How Nextflow Works Locally

Nextflow pipelines are composed of processes e.g., a task such as fastqc would be one process, then read trimming would be another process etc. Processes pass files between them using channels (queues) so every process usually has an input and output channel. Nextflow is implicitly parallel - if it can run something in parallel, it will! There is no need to loop over channels etc.

For example you could have a script with a fastqc and read_trimming processes which take in a fastq reads channel. As these two process have no links between them they will be run at the same time.

The Nextflow workflow file is called main.nf.

Lets think about a quick workflow that takes in some single-end fastq files, runs fastqc on them, then trims them, runs fastqc again and finally runs multiqc on the fastqc outputs.

An example of code that would achieve the workflow in the image (not showing what each process script looks like here)

nextflow.enable.dsl=2

//params.fastq_dir will be exposed as a pipeline input and is given a default here

params.fastq_dir = "./FASTQ/*.fq.gz"
//make a fastq ch
fastq_ch = Channel.fromPath(params.fastq_dir)

workflow {
//fastqc 
// takes in a fastq_ch and outputs a channel with fastqc html and zip files
raw_fastqc_ch = fastqc(fastq_ch)

//takes in a fastq_ch and outputs a channel with trimmed reads
trimmed_reads_ch = read_trimming(fastq_ch)

//takes in the trimmed reads channel this time
trimmed_fastqc_ch = fastqc_trimmed(trimmed_reads_ch)

//combine the two channels together to use them in multiqc 
combined_fastqc_ch = raw_fastqc_ch.mix(trimmed_fastqc_ch)

//takes in a channel containing fastqc files
//collect is used here to make all files available at the same time.
multiqc(combined_fastqc_ch.collect())
}

An example local run (not on or interacting with DNAnexus) would look like the command below. This assumes you have Nextflow on your own local machine, which is not required for DNAnexus

nextflow run main.nf --fastq_dir "/FASTQ/SRR_*.fastq.gz"

As we gave --fastq_dir a default, if your inputs match that default you could just run

nextflow run main.nf

How Nextflow works on DNAnexus

DNAnexus has developed a version of the Nextflow executor that can orchestrate Nextflow runs on the DNAnexus platform.

Once you kick-off a Nextflow run, a Nextflow 'head-node' is spun up. This stays on for the duration of the run and it spins up and controls the subjobs (each instance of a process).

Head Node

  • orchestrates subjobs

  • contains the Nextflow output directory which is usually specified by params.outdir in nfcore pipelines

  • copies the output directory to the DNAnexus project once all subjobs have completed (--destination)

Subjobs

  • one for every instance of a process

  • each subjob is one virtual machine (instance) e.g., fastqc_process(fileA) is run on one machine and fastqc_process(fileB) is run on a different machine

  • Uses a Docker image for the process environment

  • Required files pulled onto machine and outputs sent back to head node once subjob completed

  • Task execution status, temp files, stdout, sterr logs etc sent to work directory

Work Directory

  • Nextflow uses a 'work' directory (workDir) for executing tasks. Each instance of a process gets its own folder in the work directory and this directory stores task execution info, intermediate files etc.

  • Depending on if you choose to cache your work directory or not, you will be able to see this work directory on the platform during/after your nextflow run.

  • Otherwise, the work directory exists in a temporary workspace and it will be destroyed once a run has completed.

Note about Batch Processing

You may have learned about batching some inputs for WDL workflows previously. You do not need to do this for Nextflow applets - all parallelisation is done automatically by the Nextflow.

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.

Last updated

Was this helpful?