Error Strategies for Nextflow
Last updated
Was this helpful?
Last updated
Was this helpful?
Nextflow's errorStrategy directive allows you to define how the error condition is managed by the Nextflow executor at the process level.
There are :
terminate (default)
terminate all subjobs as soon as any subjob has an error
finish
when any subjob has an error, do not start any additional subjobs and wait for existing jobs to finish before exiting
ignore
pretend you didn't see it..just report a message that the subjob had an error but continue all subjobs
retry
when a subprocess returns an error, retry the process
The DNANexus nextflow documention has a
Generally the errorStrategy is defined in either the base.config
(which is referenced using includeConfig in the nextflow.config file) or in the nextflow.config
file.
In nfcore pipelines, the default errorStrategy is usually defined in base.config
and it is set to 'finish' except for error codes in a specific numeric range which are retried.
The code below is from the
The maxRetries
directive allows you to define the maximum number of times the exact same subjob can be re-submitted in case of failure and the maxErrors
directive allows you to specify the maximum number of times a process (across all subjobs of that process executed) can fail when using the retry error strategy. .
In the code above, if the exit status of the subjob (task) is within 130 to 145, inclusive, or is equal to 104, then it will retry that subjob once (maxRetries = 1). If other subjobs of the same process also have the same issue, they will also be retried once (maxErrors = '-1' disables the max number of times any process can fail so if every subjob executed for a particular process failed it will allow it to be retried the number of times set in maxRetries). Otherwise, the finish errorStategy is applied and the subjob is terminated pending but other running non-errored subjobs are allowed to complete.
For example, imagine you have a fastqc process that takes in one file at a time from a channel with 3 files (file_A, file_B, file_C)
The process is as below and is run for each file in parallel
fastqc(file_A)
fastqc(file_B)
fastqc(file_C)
If the subjob with file_A and the subjob with file_C fail first with errors in range 130-145 or with a 104 error, they can each be retried once if maxRetries =1 .
Now imagine that you set maxErrors = 2. In this case, there are 3 instances of the process but only 2 errors are allowed for all instances of the process. Thus, it will only retry 2 of the subjobs e.g. fastqc(file_A) fastqc(file_C)
If fastqc(file_B) encounters an error at any point, it won't be retried and then the whole job will go to the finish errorStrategy.
Thus, disabling the maxErrors directive by setting it to '-1' allows all failing subjobs with the specified error codes to be retried X amount of times with X set by maxRetries.
Check what version of dxpy was used to build the Nextflow pipeline and make sure it is the newest
Look at head-node log (hopefully it was ran with "debug mode" as false because when true, the log gets injected with details which isn't always useful and can make it hard to find errors)
Look for the process (sub-job) which caused the error, there will be a record of the error log from that process, though it may be truncated
Look at the failed sub-job log
Look at the raw code
Look at the cached work directories
.command.run runs to setup the runtime environment
Including staging file
Setting up Docker
.command.sh is the translated script block of the process
Translated because input channels are rendered as actual locations
.command.log, .command.out etc are all logs
Look at logs with "debug mode" as true
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.