Overview of Nextflow
Last updated
Was this helpful?
Last updated
Was this helpful?
Nextflow pipelines are composed of processes e.g., a task such as fastqc would be one process, then read trimming would be another process etc. Processes pass files between them using channels (queues) so every process usually has an input and output channel. Nextflow is implicitly parallel - if it can run something in parallel, it will! There is no need to loop over channels etc.
For example you could have a script with a fastqc and read_trimming processes which take in a fastq reads channel. As these two process have no links between them they will be run at the same time.
The Nextflow workflow file is called main.nf.
Lets think about a quick workflow that takes in some single-end fastq files, runs fastqc on them, then trims them, runs fastqc again and finally runs multiqc on the fastqc outputs.
An example of code that would achieve the workflow in the image (not showing what each process script looks like here)
An example local run (not on or interacting with DNAnexus) would look like the command below. This assumes you have Nextflow on your own local machine, which is not required for DNAnexus
As we gave --fastq_dir a default, if your inputs match that default you could just run
DNAnexus has developed a version of the Nextflow executor that can orchestrate Nextflow runs on the DNAnexus platform.
Once you kick-off a Nextflow run, a Nextflow 'head-node' is spun up. This stays on for the duration of the run and it spins up and controls the subjobs (each instance of a process).
orchestrates subjobs
contains the Nextflow output directory which is usually specified by params.outdir
in nfcore pipelines
copies the output directory to the DNAnexus project once all subjobs have completed (--destination
)
one for every instance of a process
each subjob is one virtual machine (instance) e.g., fastqc_process(fileA) is run on one machine and fastqc_process(fileB) is run on a different machine
Uses a Docker image for the process environment
Required files pulled onto machine and outputs sent back to head node once subjob completed
Task execution status, temp files, stdout, sterr logs etc sent to work directory
Nextflow uses a 'work' directory (workDir) for executing tasks. Each instance of a process gets its own folder in the work directory and this directory stores task execution info, intermediate files etc.
You may have learned about batching some inputs for WDL workflows previously. You do not need to do this for Nextflow applets - all parallelisation is done automatically by the Nextflow.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.
Depending on if you choose to or not, you will be able to see this work directory on the platform during/after your nextflow run.
Otherwise, the work directory exists in a and it will be destroyed once a run has completed.