Importing Nf-Core

Import via the User Interface (UI)

Here we will import the nf-core Sarek pipeline from github to demonstrate the functionality, but you can import any Nextflow pipeline from Github, not just nf-core ones!

Go to a DNAnexus project. Click Add and in the drop down menu select 'Import Pipeline/Workflow'

Next enter the required information (see below) and click 'Start Import'

The github url is from the url of the Sarek github repo (not what is in 'Clone' in the repo)

https://github.com/nf-core/sarek

Make sure there is no slash after 'sarek' in the URL as it will cause the importer to fail.

Choose your folder in the USERS folder to output the applet to.

To see the possible releases to use, in the github project click 'Tags'. If you leave this part blank it will use the 'main' branch for that repo.

Sarek release info

Click the 'Monitor' tab in your project to see the running/finished import job

You should see your applet in the the output folder that you specified in your project

You can see the version of dxpy that it was built with by looking at the job log for the import job

To do this click 'View Log' on the right hand side of the screen

The job log shows that the version of dxpy used here is dxpy v0.369.0

Test run the nfcore pipeline from the UI

We will run the test profile for sarek which should take 40 mins to 1 hour to run. The test profile inputs are the nextflow outdir and -profile test,docker (https://github.com/nf-core/sarek/blob/3.4.0/conf/test.config#L8)

Click one of the sarek applets that you created

Choose the platform output location for your results.
Click on 'Output to' then make a folder or choose an existing folder. I choose the outputs folder.

Click 'Next'

Output directory considerations

Specify the nextflow output directory.

This is a directory local to the machine that Nextflow will be running on not a DNAnexus path.

The outdir path must start with ./ or have no slashes in front of it so that the executor will be able to make this folder where its is running on the head node. For example ./results and results are both valid but /results or things like dx://project-xx:/results etc will not produce output in your project. Once the dnanexus nextflow executor detects that all files have been written to this folder (and thus all subjobs completed), it will copy this folder to the specified job destination on platform. In the event that the pipeline fails before completion, this folder will not be written to the project.

Here I have chosen to place the nextflow output files in a directory on the head node of the run named ./test. This creates an outdir called test.

Thus once this job completes, my results will be in dx://project-xxx:/outputs/test

More details about this are found here in our Documentation.

Where test is the folder that was copied from the head node of the Nextflow run to the destination that I specified for it on platform.

Scroll down and in 'Nextflow Options', 'Nextflow Run Options'
type -profile test,docker
You must use Docker for all Nextflow pipelines run on DNAnexus. Every nf-core pipeline has a Docker profile in it's nextflow.config file. You need to specify -profile docker in the Nextflow run options ('Nextflow Run Options' on UI, -inextflow_run_opts in CLI) of the applet CLI or UI to get it to use Docker containers for each process.
Then click 'Start Analysis'. You will be brought to this screen
Click 'Launch Analysis'.

Go to the Monitor tab to see your running job.

Note! The estimated cost per hour is the cost to run the head node only! Each instance of the nextflow processes (subjobs) will have their own instances with their own costs.

Import via the CLI

Select a project to build the applet in

dx select  # press enter

and choose the number associated with your project.

Or select your project using its name or project ID

dx select project-ID
#or
dx select my_project_name

Replace the folder name with your folder name

dx build --nextflow --repository https://github.com/nf-core/sarek --repository-tag 3.4.0 --destination project-ID:/USERS/FOLDERNAME/sarek_v3.4.0_cli_import

This will place the sarek applet in a folder called sarek_v3.4.0_cli_import in the /USERS/FOLDERNAME folder in the project.

You can see the job running/completed in the Monitor tab of your project.

If you are using a private github repository, you can supply a git credentials file to dx build using the --git-credentials option. The git credentials file has the following format.

providers {
  github {
    user = 'username'
    password = 'ghp_xxxx'
  }
}

It must be stored in a project on platform. For more information on this file see here.

Build via the CLI from a Local Folder

Build the Nextflow pipeline from a folder on your local machine

This approach is useful for building pipelines that you have built yourself into Nextflow applets and for pipelines that you do not have in a github repository.

It is also useful if you need to alter something from a public repo locally (e.g. change some code in a file to fix a bug without fixing it in the public repo) and want to build using the locally updated directory instead of the git repo.

Additionally, if you want to use the most up-to-date dxpy version, you will need to use this approach. Sometimes the workers executing the remote repository builds can be a version or two behind the latest release of dxpy. You may want to use the latest version of dxpy for instance if there was a bug in the Nextflow executor bundled with an older dxpy version that you do not want to run into.

For example, running dx version shows that I am using dx v0.370.2 which is what will be used for the applet we build with this approach.

dx --version
#dx v0.370.2

However, we saw the UI and CLI import jobs used dxpy v0.369.0, which is 2 versions behind this version.

Clone the git repository

git clone --branch 3.4.0 https://github.com/nf-core/sarek.git
# Here I change the folder name to something with the version in it to help me keep track of different versions of sarek
mv sarek sarek_v3.4.0_cli

Once you have selected the project to build in using dx select, then build using the --nextflow flag

dx build --nextflow sarek_v3.4.0_cli --destination project-ID:/USERS/FOLDERNAME/sarek_v3.4.0_cli

You should see an applet ID if it has built successfully.

applet-xxx

Note that this approach does not generate a job log and it will use the version of dxpy on your local machine. So if using dxpy v0.370.2, then the applet will be packaged with this version of dxpy and its corresponding version of nextflow (23.10.0 in this case)

Test run the nfcore pipeline from the CLI

To see the help command for the applet:

Use dx run <applet-name/applet-ID> -h

dx run sarek_v3.4.0_ui -h

or using it's applet ID (useful when multiple versions of the applet with the same name as each version will have it's own ID). Also you can run an applet using its ID from anywhere in the project but if using its name you must dx cd etc to its folder before using it.

dx run applet-ID -h

Excerpt of the help command

usage: dx run sarek_v3.4.0_ui [-iINPUT_NAME=VALUE ...]

Applet: sarek

sarek

Inputs:
  outdir: [-ioutdir=(string)]
        (Nextflow pipeline required)

  step: [-istep=(string)]
        (Nextflow pipeline required) Default value:mapping The pipeline starts
        from this step and then runs through the possible subsequent steps.

  input: [-iinput=(file)]
        (Nextflow pipeline optional) A design file with information about the
        samples in your experiment. Use this parameter to specify the location
        of the input files. It has to be a comma-separated file with a header
        row. See [usage docs](https://nf-co.re/sarek/usage#input).  If no
        input file is specified, sarek will attempt to locate one in the
        `{outdir}` directory. If no input should be supplied, i.e. when --step
        is supplied or --build_from_index, then set --input false
...

Run command

dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli' -inextflow_run_opts='-profile test,docker' --destination 'project-ID:/USERS/FOLDERNAME'

To run this, copy the command to your terminal and replace 'USERS/FOLDERNAME' with your folder name

Then press Enter.

You should see

 
Using input JSON:
{
    "outdir": "./test_run_cli",
    "nextflow_run_opts": "-profile test,docker"
}
Confirm running the executable with this input [Y/n]:

Type y to proceed.

You can also add '-y' to the run command to get it to run without prompting e.g.,

dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli' -inextflow_run_opts='-profile test,docker' --destination 'project-ID:/USERS/FOLDERNAME' -y

You can track the progress of your job using the 'Monitor' tab of your project in the UI

Once the run successfully completes, your results will be in dx://project-xxx:/USERS/FOLDERNAME/test_run_cli where test_run_cli is the folder on the head node of the nextflow run that is copied to the 'outputs' folder in your project on platform.

Note that as destination is a DNAnexus command and not a nextflow one it starts with '--' and does not have an '=' after it.

Controlling the number of parallel subjobs

In the CLI

By default the DNAnexus executor will only run 5 subjobs in parallel. You can change this by passing the -queue-size flag to nextflow_run_opts with the number you require. There is a limit of 100 subjobs per user per project for most users but you can give any number up to 1000 before it will give you an error as noted in the Queue Size Configuration Documentation. For example, if you know that you are passing 20 files to a run and that only a few of subjobs can be run on all 20 files at a time you could set the queueSize to 60.

Lets change it to 20 for our nf-core Sarek run. Then the command would be

dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli_qs' -inextflow_run_opts='-profile test,docker -queue-size 20' --destination 'project-ID:/USERS/FOLDERNAME'

In the UI, the string would look as below

To change the Queue Size for your Applet at Build Time

You can also set the queue size when building your own applets in the nextflow.config. To change the default from 5 to 20 for your applet at build time, add this line to your nextflow.config

executor.queueSize = 20

or (equivalent)

executor {
    queueSize = 20 
}

However, you can change the queue size at runtime, regardless of if it is mentioned in your nextflow.config or not, by passing -queue-size X where X is a number between 1 and 1000 to the nextflow run options.

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.

PreviousNextflow Setup NextBuilding Nextflow Applets

Last updated 11 months ago

Was this helpful?