# Importing Nf-Core

## Import via the User Interface (UI)

Here we will import the nf-core Sarek pipeline from github to demonstrate the functionality, but you can import any Nextflow pipeline from Github, not just nf-core ones!

Go to a DNAnexus project. Click Add and in the drop down menu select 'Import Pipeline/Workflow'

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-852c45e77725e8f580538fb6256c9d977ba57727%2Fui_importer.png?alt=media" alt=""><figcaption></figcaption></figure>

Next enter the required information (see below) and click 'Start Import'

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-6ab202e81f051c91e7a0053b304691e4c2ddc782%2Fui_importer_screen.png?alt=media)

The github url is from the url of the Sarek github repo (not what is in 'Clone' in the repo)

```
https://github.com/nf-core/sarek
```

Make sure there is no slash after 'sarek' in the URL as it will cause the importer to fail.

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-5c866bda64bf3d2c953d4a8aa0f546091a405bd3%2Fsarek_github_url-01.png?alt=media)

Choose your folder in the USERS folder to output the applet to.

To see the possible releases to use, in the github project click 'Tags'. If you leave this part blank it will use the 'main' branch for that repo.

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-855d26cd50ac710a74580d565da6b4d271123bae%2Fgithub_version_tags.png?alt=media)

[Sarek release info](https://github.com/nf-core/sarek/releases)

Click the 'Monitor' tab in your project to see the running/finished import job

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-3fcb41dc53a179f569950aa757a76fcf9ef61baf%2Fmonitor_success.png?alt=media" alt=""><figcaption></figcaption></figure>

You should see your applet in the the output folder that you specified in your project

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-c270a44c624fd6b8ac6eb47abcace75be6a1d929%2Fapplet.png?alt=media" alt=""><figcaption></figcaption></figure>

You can see the version of dxpy that it was built with by looking at the job log for the import job

To do this click 'View Log' on the right hand side of the screen

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-9787b6bb54938f36a7bbda6fb87dc5bbb872a8b6%2Fview_log.png?alt=media" alt=""><figcaption></figcaption></figure>

The job log shows that the version of dxpy used here is dxpy v0.369.0

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-3b50c86375bfd6943367e64846a65fb3deb262f9%2Fui_log_dxpy_version.png?alt=media)

## Test run the nfcore pipeline from the UI

We will run the test profile for sarek which should take 40 mins to 1 hour to run. The test profile inputs are the nextflow outdir and -profile test,docker (<https://github.com/nf-core/sarek/blob/3.4.0/conf/test.config#L8>)

1. Click one of the sarek applets that you created

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-85bda97728a46695dc8631bb0cb18b4b2c511cf9%2Fsarek_applet.png?alt=media)

2. Choose the platform output location for your results.

   Click on 'Output to' then make a folder or choose an existing folder. I choose the outputs folder.

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-e78fabf5631c7699b547d7264b30699ca2684900%2Fplatform_outputs.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Click 'Next'

**Output directory considerations**

4. Specify the nextflow output directory.

This is a directory local to the machine that Nextflow will be running on *not a DNAnexus path*.

*The `outdir` path must start with `./` or have no slashes in front of it* so that the executor will be able to make this folder where its is running on the head node. For example `./results` and `results` are both valid but `/results` or things like `dx://project-xx:/results` etc will not produce output in your project. Once the dnanexus nextflow executor detects that all files have been written to this folder (and thus all subjobs completed), it will copy this folder to the specified job destination on platform. In the event that the pipeline fails before completion, this folder will not be written to the project.

Here I have chosen to place the nextflow output files in a directory on the head node of the run named `./test`. This creates an outdir called test.

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-6e6021a777f8e44ee380f4e5c4514165825bbf9e%2Fchoose_outdir.png?alt=media)

Thus once this job completes, my results will be in dx://project-xxx:/outputs/test

More details about this are found [here](https://documentation.dnanexus.com/user/running-apps-and-workflows/running-nextflow-pipelines#can-i-have-an-example-of-how-to-construct-an-output-path-when-i-run-a-nextflow-pipeline-with-params) in our Documentation.

Where test is the folder that was copied from the head node of the Nextflow run to the destination that I specified for it on platform.

5. Scroll down and in **'Nextflow Options'**, **'Nextflow Run Options'**

   type `-profile test,docker`

   ![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-aca937d502505e0bd200a4958a8f13e0b0ee8fd6%2Frun_opts.png?alt=media)

   You must use Docker for all Nextflow pipelines run on DNAnexus. Every nf-core pipeline has a Docker profile in it's nextflow\.config file. You need to specify -profile docker in the Nextflow run options ('Nextflow Run Options' on UI, -inextflow\_run\_opts in CLI) of the applet CLI or UI to get it to use Docker containers for each process.
6. Then click **'Start Analysis'.** You will be brought to this screen

   <figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-6947697f283eb182ec6d3296496b33b220e702bf%2Freview_and_start.png?alt=media" alt=""><figcaption></figcaption></figure>
7. Click **'Launch Analysis'.**

Go to the Monitor tab to see your running job.

**Note! The estimated cost per hour is the cost to run the head node only! Each instance of the nextflow processes (subjobs) will have their own instances with their own costs.**

## Import via the CLI

Select a project to build the applet in

```
dx select  # press enter
```

and choose the number associated with your project.

Or select your project using its name or project ID

```
dx select project-ID
#or
dx select my_project_name
```

Replace the folder name with your folder name

```
dx build --nextflow --repository https://github.com/nf-core/sarek --repository-tag 3.4.0 --destination project-ID:/USERS/FOLDERNAME/sarek_v3.4.0_cli_import
```

This will place the `sarek` applet in a folder called `sarek_v3.4.0_cli_import` in the /USERS/FOLDERNAME folder in the project.

You can see the job running/completed in the Monitor tab of your project.

![](https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-b225c64940dd67a9689a08487cf52d00c74bf681%2Fcli_import.png?alt=media)

If you are using a private github repository, you can supply a git credentials file to dx build using the `--git-credentials` option. The git credentials file has the following format.

```
providers {
  github {
    user = 'username'
    password = 'ghp_xxxx'
  }
}
```

It must be stored in a project on platform. For more information on this file see [here](https://documentation.dnanexus.com/user/running-apps-and-workflows/running-nextflow-pipelines#import-via-cli).

### Build via the CLI from a Local Folder

Build the Nextflow pipeline from a folder on your local machine

This approach is useful for building pipelines that you have built yourself into Nextflow applets and for pipelines that you do not have in a github repository.

It is also useful if you need to alter something from a public repo locally (e.g. change some code in a file to fix a bug without fixing it in the public repo) and want to build using the locally updated directory instead of the git repo.

Additionally, if you want to use the most up-to-date dxpy version, you will need to use this approach. Sometimes the workers executing the remote repository builds can be a version or two behind the latest release of dxpy. You may want to use the latest version of dxpy for instance if there was a bug in the Nextflow executor bundled with an older dxpy version that you do not want to run into.

For example, running dx version shows that I am using dx v0.370.2 which is what will be used for the applet we build with this approach.

```
dx --version
#dx v0.370.2
```

However, we saw the UI and CLI import jobs used dxpy v0.369.0, which is [2 versions behind this version](https://github.com/dnanexus/dx-toolkit/blob/master/CHANGELOG.md#3690---202425).

Clone the git repository

```
git clone --branch 3.4.0 https://github.com/nf-core/sarek.git
# Here I change the folder name to something with the version in it to help me keep track of different versions of sarek
mv sarek sarek_v3.4.0_cli
```

Once you have selected the project to build in using `dx select`, then build using the `--nextflow` flag

```
dx build --nextflow sarek_v3.4.0_cli --destination project-ID:/USERS/FOLDERNAME/sarek_v3.4.0_cli
```

You should see an applet ID if it has built successfully.

```
applet-xxx
```

**Note that this approach does not generate a job log and it will use the version of dxpy on your local machine. So if using dxpy v0.370.2, then the applet will be packaged with this version of dxpy and its corresponding version of nextflow (23.10.0 in this case)**

## Test run the nfcore pipeline from the CLI

To see the help command for the applet:

Use `dx run <applet-name/applet-ID> -h`

```
dx run sarek_v3.4.0_ui -h 
```

or using it's applet ID (useful when multiple versions of the applet with the same name as each version will have it's own ID). Also you can run an applet using its ID from anywhere in the project but if using its name you must `dx cd` etc to its folder before using it.

```
dx run applet-ID -h
```

Excerpt of the help command

```
usage: dx run sarek_v3.4.0_ui [-iINPUT_NAME=VALUE ...]

Applet: sarek

sarek

Inputs:
  outdir: [-ioutdir=(string)]
        (Nextflow pipeline required)

  step: [-istep=(string)]
        (Nextflow pipeline required) Default value:mapping The pipeline starts
        from this step and then runs through the possible subsequent steps.

  input: [-iinput=(file)]
        (Nextflow pipeline optional) A design file with information about the
        samples in your experiment. Use this parameter to specify the location
        of the input files. It has to be a comma-separated file with a header
        row. See [usage docs](https://nf-co.re/sarek/usage#input).  If no
        input file is specified, sarek will attempt to locate one in the
        `{outdir}` directory. If no input should be supplied, i.e. when --step
        is supplied or --build_from_index, then set --input false
...
```

Run command

```
dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli' -inextflow_run_opts='-profile test,docker' --destination 'project-ID:/USERS/FOLDERNAME'
```

To run this, copy the command to your terminal and replace 'USERS/FOLDERNAME' with your folder name

Then press `Enter`.

You should see

```
 
Using input JSON:
{
    "outdir": "./test_run_cli",
    "nextflow_run_opts": "-profile test,docker"
}
Confirm running the executable with this input [Y/n]:
```

Type `y` to proceed.

You can also add '-y' to the run command to get it to run without prompting e.g.,

```
dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli' -inextflow_run_opts='-profile test,docker' --destination 'project-ID:/USERS/FOLDERNAME' -y
```

You can track the progress of your job using the 'Monitor' tab of your project in the UI

* Once the run successfully completes, your results will be in [dx://project-xxx:/USERS/FOLDERNAME/test\_run\_cli](dx://project-xxx/USERS/FOLDERNAME/test_run_cli) where test\_run\_cli is the folder on the head node of the nextflow run that is copied to the 'outputs' folder in your project on platform.

**Note that as `destination` is a DNAnexus command and not a nextflow one it starts with '--' and does not have an '=' after it.**

## Controlling the number of parallel subjobs

### In the CLI

By default the DNAnexus executor will only run 5 subjobs in parallel. You can change this by passing the -queue-size flag to `nextflow_run_opts` with the number you require. There is a limit of 100 subjobs per user per project for most users but you can give any number up to 1000 before it will give you an error as noted in the [Queue Size Configuration Documentation](https://documentation.dnanexus.com/user/running-apps-and-workflows/running-nextflow-pipelines#queue-size-configuration). For example, if you know that you are passing 20 files to a run and that only a few of subjobs can be run on all 20 files at a time you could set the queueSize to 60.

Lets change it to 20 for our nf-core Sarek run. Then the command would be

```
dx run sarek_v3.4.0_ui -ioutdir='./test_run_cli_qs' -inextflow_run_opts='-profile test,docker -queue-size 20' --destination 'project-ID:/USERS/FOLDERNAME'
```

### In the UI, the string would look as below

<figure><img src="https://1979569080-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FPtCOm9rXoRi4P9rh1ET8%2Fuploads%2Fgit-blob-36117d9c2f90229d51eb7b81c9ff6680f74a142b%2Fqueue-size.png?alt=media" alt=""><figcaption></figcaption></figure>

### To change the Queue Size for your Applet at Build Time

You can also set the queue size when building your own applets in the nextflow\.config. To change the default from 5 to 20 for your applet at build time, add this line to your nextflow\.config

```
executor.queueSize = 20 
```

or (equivalent)

```
executor {
    queueSize = 20 
}
```

However, you can change the queue size at runtime, regardless of if it is mentioned in your nextflow\.config or not, by passing `-queue-size X` where X is a number between 1 and 1000 to the nextflow run options.

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.

*Some of the links on these pages will take the user to pages that are maintained by third parties. The accuracy and IP rights of the information on these third party is the responsibility of these third parties.*
