Importing Nf-Core
Last updated
Was this helpful?
Last updated
Was this helpful?
Here we will import the nf-core Sarek pipeline from github to demonstrate the functionality, but you can import any Nextflow pipeline from Github, not just nf-core ones!
Go to a DNAnexus project. Click Add and in the drop down menu select 'Import Pipeline/Workflow'
Next enter the required information (see below) and click 'Start Import'
The github url is from the url of the Sarek github repo (not what is in 'Clone' in the repo)
Make sure there is no slash after 'sarek' in the URL as it will cause the importer to fail.
Choose your folder in the USERS folder to output the applet to.
To see the possible releases to use, in the github project click 'Tags'. If you leave this part blank it will use the 'main' branch for that repo.
Click the 'Monitor' tab in your project to see the running/finished import job
You should see your applet in the the output folder that you specified in your project
You can see the version of dxpy that it was built with by looking at the job log for the import job
To do this click 'View Log' on the right hand side of the screen
The job log shows that the version of dxpy used here is dxpy v0.369.0
Click one of the sarek applets that you created
Choose the platform output location for your results.
Click on 'Output to' then make a folder or choose an existing folder. I choose the outputs folder.
Click 'Next'
Output directory considerations
Specify the nextflow output directory.
This is a directory local to the machine that Nextflow will be running on not a DNAnexus path.
The outdir
path must start with ./
or have no slashes in front of it so that the executor will be able to make this folder where its is running on the head node. For example ./results
and results
are both valid but /results
or things like dx://project-xx:/results
etc will not produce output in your project. Once the dnanexus nextflow executor detects that all files have been written to this folder (and thus all subjobs completed), it will copy this folder to the specified job destination on platform. In the event that the pipeline fails before completion, this folder will not be written to the project.
Here I have chosen to place the nextflow output files in a directory on the head node of the run named ./test
. This creates an outdir called test.
Thus once this job completes, my results will be in dx://project-xxx:/outputs/test
Where test is the folder that was copied from the head node of the Nextflow run to the destination that I specified for it on platform.
Scroll down and in 'Nextflow Options', 'Nextflow Run Options'
type -profile test,docker
You must use Docker for all Nextflow pipelines run on DNAnexus. Every nf-core pipeline has a Docker profile in it's nextflow.config file. You need to specify -profile docker in the Nextflow run options ('Nextflow Run Options' on UI, -inextflow_run_opts in CLI) of the applet CLI or UI to get it to use Docker containers for each process.
Then click 'Start Analysis'. You will be brought to this screen
Click 'Launch Analysis'.
Go to the Monitor tab to see your running job.
Note! The estimated cost per hour is the cost to run the head node only! Each instance of the nextflow processes (subjobs) will have their own instances with their own costs.
Select a project to build the applet in
and choose the number associated with your project.
Or select your project using its name or project ID
Replace the folder name with your folder name
This will place the sarek
applet in a folder called sarek_v3.4.0_cli_import
in the /USERS/FOLDERNAME folder in the project.
You can see the job running/completed in the Monitor tab of your project.
If you are using a private github repository, you can supply a git credentials file to dx build using the --git-credentials
option. The git credentials file has the following format.
Build the Nextflow pipeline from a folder on your local machine
This approach is useful for building pipelines that you have built yourself into Nextflow applets and for pipelines that you do not have in a github repository.
It is also useful if you need to alter something from a public repo locally (e.g. change some code in a file to fix a bug without fixing it in the public repo) and want to build using the locally updated directory instead of the git repo.
Additionally, if you want to use the most up-to-date dxpy version, you will need to use this approach. Sometimes the workers executing the remote repository builds can be a version or two behind the latest release of dxpy. You may want to use the latest version of dxpy for instance if there was a bug in the Nextflow executor bundled with an older dxpy version that you do not want to run into.
For example, running dx version shows that I am using dx v0.370.2 which is what will be used for the applet we build with this approach.
Clone the git repository
Once you have selected the project to build in using dx select
, then build using the --nextflow
flag
You should see an applet ID if it has built successfully.
Note that this approach does not generate a job log and it will use the version of dxpy on your local machine. So if using dxpy v0.370.2, then the applet will be packaged with this version of dxpy and its corresponding version of nextflow (23.10.0 in this case)
To see the help command for the applet:
Use dx run <applet-name/applet-ID> -h
or using it's applet ID (useful when multiple versions of the applet with the same name as each version will have it's own ID). Also you can run an applet using its ID from anywhere in the project but if using its name you must dx cd
etc to its folder before using it.
Excerpt of the help command
Run command
To run this, copy the command to your terminal and replace 'USERS/FOLDERNAME' with your folder name
Then press Enter
.
You should see
Type y
to proceed.
You can also add '-y' to the run command to get it to run without prompting e.g.,
You can track the progress of your job using the 'Monitor' tab of your project in the UI
Note that as destination
is a DNAnexus command and not a nextflow one it starts with '--' and does not have an '=' after it.
Lets change it to 20 for our nf-core Sarek run. Then the command would be
You can also set the queue size when building your own applets in the nextflow.config. To change the default from 5 to 20 for your applet at build time, add this line to your nextflow.config
or (equivalent)
However, you can change the queue size at runtime, regardless of if it is mentioned in your nextflow.config or not, by passing -queue-size X
where X is a number between 1 and 1000 to the nextflow run options.
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.
We will run the test profile for sarek which should take 40 mins to 1 hour to run. The test profile inputs are the nextflow outdir and -profile test,docker ()
More details about this are found in our Documentation.
It must be stored in a project on platform. For more information on this file see .
However, we saw the UI and CLI import jobs used dxpy v0.369.0, which is .
Once the run successfully completes, your results will be in where test_run_cli is the folder on the head node of the nextflow run that is copied to the 'outputs' folder in your project on platform.
By default the DNAnexus executor will only run 5 subjobs in parallel. You can change this by passing the -queue-size flag to nextflow_run_opts
with the number you require. There is a limit of 100 subjobs per user per project for most users but you can give any number up to 1000 before it will give you an error as noted in the . For example, if you know that you are passing 20 files to a run and that only a few of subjobs can be run on all 20 files at a time you could set the queueSize to 60.