Error Strategies for Nextflow

Nextflow's errorStrategy directive allows you to define how the error condition is managed by the Nextflow executor at the process level.

There are 4 possible strategies:

errorStrategy
Description

terminate (default)

terminate all subjobs as soon as any subjob has an error

finish

when any subjob has an error, do not start any additional subjobs and wait for existing jobs to finish before exiting

ignore

pretend you didn't see it..just report a message that the subjob had an error but continue all subjobs

retry

when a subprocess returns an error, retry the process

The DNANexus nextflow documention has a very detailed description of what happens for each errorStrategy

Generally the errorStrategy is defined in either the base.config (which is referenced using includeConfig in the nextflow.config file) or in the nextflow.config file.

In nfcore pipelines, the default errorStrategy is usually defined in base.config and it is set to 'finish' except for error codes in a specific numeric range which are retried.

The code below is from the sarek base.config

    // memory errors which should be retried. otherwise error out
    errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
    maxRetries    = 1
    maxErrors     = '-1'

The maxRetries directive allows you to define the maximum number of times the exact same subjob can be re-submitted in case of failure and the maxErrors directive allows you to specify the maximum number of times a process (across all subjobs of that process executed) can fail when using the retry error strategy. See this github issue for more of an explanation.

In the code above, if the exit status of the subjob (task) is within 130 to 145, inclusive, or is equal to 104, then it will retry that subjob once (maxRetries = 1). If other subjobs of the same process also have the same issue, they will also be retried once (maxErrors = '-1' disables the max number of times any process can fail so if every subjob executed for a particular process failed it will allow it to be retried the number of times set in maxRetries). Otherwise, the finish errorStategy is applied and the subjob is terminated pending but other running non-errored subjobs are allowed to complete.

For example, imagine you have a fastqc process that takes in one file at a time from a channel with 3 files (file_A, file_B, file_C)

The process is as below and is run for each file in parallel

  • fastqc(file_A)

  • fastqc(file_B)

  • fastqc(file_C)

If the subjob with file_A and the subjob with file_C fail first with errors in range 130-145 or with a 104 error, they can each be retried once if maxRetries =1 .

Now imagine that you set maxErrors = 2. In this case, there are 3 instances of the process but only 2 errors are allowed for all instances of the process. Thus, it will only retry 2 of the subjobs e.g. fastqc(file_A) fastqc(file_C)

If fastqc(file_B) encounters an error at any point, it won't be retried and then the whole job will go to the finish errorStrategy.

Thus, disabling the maxErrors directive by setting it to '-1' allows all failing subjobs with the specified error codes to be retried X amount of times with X set by maxRetries.

Debugging Checklist for Errors

  • Check what version of dxpy was used to build the Nextflow pipeline and make sure it is the newest

  • Look at head-node log (hopefully it was ran with "debug mode" as false because when true, the log gets injected with details which isn't always useful and can make it hard to find errors)

    • Look for the process (sub-job) which caused the error, there will be a record of the error log from that process, though it may be truncated

  • Look at the failed sub-job log

  • Look at the raw code

  • Look at the cached work directories

    • .command.run runs to setup the runtime environment

      • Including staging file

      • Setting up Docker

    • .command.sh is the translated script block of the process

      • Translated because input channels are rendered as actual locations

    • .command.log, .command.out etc are all logs

  • Look at logs with "debug mode" as true

Resources

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.

Last updated

Was this helpful?