Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Overview of Interacting with the Platform
  • Terms
  • Installation
  • Introducing dx-toolkit
  • Logging Into the Platform
  • Working with Projects and Users
  • Data Exploration
  • Copying and Moving Files
  • Uploading Data
  • Inspecting Object Metadata
  • Copying and Moving Files
  • Finding Data
  • Running Jobs
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Command Line Interface (CLI)

Introduction to CLI

PreviousCommand Line Interface (CLI)NextAdvanced CLI

Last updated 4 months ago

Was this helpful?

Overview of Interacting with the Platform

Users of the platform like to interact with it in a variety of ways (shown below), but this section is dedicated to those that want to learn how to interact with it using the command line, or CLI.

Terms

The CLI interacts with the platform in the following way:

  • The CLI (command line interface) is run locally on your own machine.

  • On your local machine, you will download the SDK (software development kit), which we also call dx-toolkit. Information on downloading it and other requirements is found in the Getting Started Guide. Once set up, this allows you to log into the platform and explore your data/ projects, create apps and workflows, and launch analyses.

Installation

Please ensure that you are running Python 3 before starting this install.

To install:

pip3 install dxpy

To upgrade dxpy

pip3 install –upgrade dxpy

Introducing dx-toolkit

The dx command will be your most used utility for interacting with the DNAnexus platform. You can run the command with no arguments or with the -h or --help flags to see the usage:

usage: dx [-h] [--version] command ...

DNAnexus Command-Line Client, API v1.0.0, client v0.346.0

dx is a command-line client for interacting with the DNAnexus platform.  You
can log in, navigate, upload, organize and share your data, launch analyses,
and more.  For a quick tour of what the tool can do, see

  https://documentation.dnanexus.com/getting-started/tutorials/cli-quickstart#q>

For a breakdown of dx commands by category, run "dx help".

dx exits with exit code 3 if invalid input is provided or an invalid operation
is requested, and exit code 1 if an internal error is encountered.  The latter
usually indicate bugs in dx; please report them at

  https://github.com/dnanexus/dx-toolkit/issues

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment
              variables
  --version   show program's version number and exit

Sometime the usage make occupy your entire terminal, in which case you may see (END) to show that you are at the end of the documentation. Press q to quit the usage, or use the universal Ctrl-C to send an interrupt signal to the process to kill it.

Run dx help to read about the categories of commands you can run:

$ dx help
usage: dx help [-h] [command_or_category] [subcommand]

Displays the help message for the given command (and subcommand if given), or
displays the list of all commands in the given category.

CATEGORIES

  all       All commands
  session   Manage your login session
  fs        Navigate and organize your projects and files
  data      View, download, and upload data
  metadata  View and modify metadata for projects, data, and executions
  workflow  View and modify workflows
  exec      Manage and run apps, applets, and workflows
  org       Administer and operate on orgs
  other     Miscellaneous advanced utilities

Logging Into the Platform

Let's start by using dx login to gain access to the DNAnexus platform from the command line. All dx commands will respond to -h|--help, so run the command with one of these flags to read the usage:

$ dx login -h
usage: dx login [-h] [--env-help] [--token TOKEN] [--noprojects] [--save]
                [--timeout TIMEOUT]

Log in interactively and acquire credentials. Use "--token" to log in with an
existing API token.

options:
  -h, --help         show this help message and exit
  --env-help         Display help message for overriding environment variables
  --token TOKEN      Authentication token to use
  --noprojects       Do not print available projects
  --save             Save token and other environment variables for future
                     sessions
  --timeout TIMEOUT  Timeout for this login token (in seconds, or use suffix
                     s, m, h, d, w, M, y)

The help documentation is often called the usage because that is often the first word of the output. In the previous output, notice that the all the arguments are enclosed in square brackets, e.g., [--token TOKEN]. This is a common convention in Unix documentation to indicate that the argument is optional. The lack of such square brackets means the argument is required.

Some of the arguments require a value to follow. For example, --token TOKEN means the argument --token must be followed by the string value for the token. Arguments like --save are known as flags. They are either present or not and often represent a Boolean value, usually "True" when present and "False" when absent.

The most basic usage for login is to enter your username and password when prompted:

$ dx login
Acquiring credentials from https://auth.dnanexus.com
Username: XXXXXXXX
Password: XXXXXXXX

TODO: Reasons for using tokens, security, dangers. You may also generate a token in the web UI for use on the command line:

$ dx login --token xxxxxxxxxxx

Use dx logout to log out of the platform. This invalidates a token.

If you are ever in doubt of your username, use dx whoami to see your identity.

  • When you ssh into a cloud workstation, you will be your normal DNAnexus user.

  • When running the ttyd app to access a cloud workstation through the UI, you will be the privileged Unix user root.

  • When you ssh into a running job, you will be the user dnanexus.

Working with Projects and Users

A project is the smallest unit of sharing in DNAnexus, and you must always work in the context of a project. Upon login, you will be prompted to select a project. To change projects, use dx select. Use -h|--help to view the usage:

$ dx select -h
usage: dx select [-h] [--env-help] [--name NAME]
                 [--level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}] [--public]
                 [project]

Interactively list and select a project to switch to. By default, only lists
projects for which you have at least CONTRIBUTE permissions. Use --public to
see the list of public projects.

positional arguments:
  project               Name or ID of a project to switch to; if not provided
                        a list will be provided for you

options:
  -h, --help            show this help message and exit
  --env-help            Display help message for overriding environment
                        variables
  --name NAME           Name of the project (wildcard patterns supported)
  --level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
                        Minimum level of permissions expected
  --public              Include ONLY public projects (will automatically set
                        --level to VIEW)

When run with no options, you will be presented a list of your projects and privilege:

$ dx select

Note: Use dx select --level VIEW or dx select --public to
select from projects for which you only have VIEW permissions.

Available projects (CONTRIBUTE or higher):
0) App Dev (ADMINISTER)
1) Methylation (ADMINISTER)
2) Genomes (ADMINISTER)
3) WTS (ADMINISTER)
4) WGS (ADMINISTER)
5) Exome (ADMINISTER)
6) QC (ADMINISTER)
7) Collaborators (ADMINISTER)
8) Pipeline Dev (ADMINISTER)
9) WDL Test (ADMINISTER)
m) More options not shown...

Pick a numbered choice or "m" for more options [0]:

Press Enter to choose the first project, or select a number 0-9 to choose a project or m for "more" options. You can also provide a project name or ID as the first argument:

$ dx select project-XXXXXXXXXXXXXXXXXXXXXXXX
$ dx select "Pipeline Dev"

Use the --level option to specify only projects where you have a particular permission. For instance, dx select --level ADMINISTER will show only projects where you are an administrator.

Normally, projects are private to your organization, but the --public option will display the public projects that DNAnexus uses to share common resources like sequence files or indexes for reference genomes:

$ dx select --public

Available public projects:
0) Reference Genome Files: Azure US (West) (VIEW)
1) App_Assets_Europe(London)_Internal (VIEW)
2) Reference Genome Files: Azure Amsterdam (VIEW)
3) Reference Genome Files: AWS Germany (VIEW)
4) Reference Genome Files: AWS US (East) (VIEW)
5) Reference Genome Files: AWS Europe (London) (VIEW)
6) App and Applet Assets Azure (VIEW)
7) dxCompiler_Europe_London (VIEW)
8) dxCompiler_Sydney (VIEW)
9) dxCompiler_Berlin (VIEW)
m) More options not shown...

Pick a numbered choice or "m" for more options:

Press Ctrl-C to exit the program without making a selection.

If you are ever in doubt as to your current project, run dx pwd (print working directory):

$ dx pwd
Pipeline Dev:/

Alternatively, you can run dx env to see your current environment:

$ dx env
Auth token used         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
API server protocol     https
API server host         api.dnanexus.com
API server port         443
Current workspace       project-XXXXXXXXXXXXXXXXXXXXXXXX
Current workspace name  "Pipeline Dev"
Current folder          /
Current user            test_user

If I wanted to share some data with a collaborator, I would use dx new project to create a new project to hold select data and apps. Following is the usage:

$ dx new project -h
usage: dx new project [-h] [--brief | --verbose] [--env-help]
                      [--region REGION] [-s] [--bill-to BILL_TO] [--phi]
                      [--database-ui-view-only]
                      [name]

Create a new project

positional arguments:
  name                  Name of the new project

options:
  -h, --help            show this help message and exit
  --brief               Display a brief version of the return value; for most
                        commands, prints a DNAnexus ID per line
  --verbose             If available, displays extra verbose output
  --env-help            Display help message for overriding environment
                        variables
  --region REGION       Region affinity of the new project
  -s, --select          Select the new project as current after creating
  --bill-to BILL_TO     ID of the user or org to which the project will be
                        billed. The default value is the billTo of the
                        requesting user.
  --phi                 Add PHI protection to project
  --database-ui-view-only
                        Viewers on the project cannot access database data
                        directly
$ dx new project --region aws:us-east-1 demo_project
Created new project called "demo_project" (project-GXZ90x00fF6F4fy1K20x4gv9)
Switch to new project now? [y/N]: y

Next, I would use dx invite <user-id> to invite users to the project. Start with the usage to see how to call the command:

$ dx invite -h
usage: dx invite [-h] [--env-help] [--no-email]
                 invitee [project] [{VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}]

Invite a DNAnexus entity to a project. If the invitee is not recognized as a
DNAnexus ID, it will be treated as a username, i.e. "dx invite alice : VIEW"
is equivalent to inviting the user with user ID "user-alice" to view your
current default project.

positional arguments:
  invitee               Entity to invite
  project               Project to invite the invitee to
  {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
                        Permissions level the new member should have

options:
  -h, --help            show this help message and exit
  --env-help            Display help message for overriding environment
                        variables
  --no-email            Disable email notifications to invitee

The usage to see that this command includes three positional arguments, the first of which (invitee) is required and the other two (project, permissions) are optional. Your currently selected project is the default project, and "VIEW" is the default permission. If you wish to indicate some permission other than "VIEW," you must specify the project first.

Use dx uninvite <user-id> to revoke a user's access to a project:

$ dx uninvite -h
usage: dx uninvite [-h] [--env-help] entity [project]

Revoke others' permissions on a project you administer. If the entity is not
recognized as a DNAnexus ID, it will be treated as a username, i.e. "dx
uninvite alice :" is equivalent to revoking the permissions of the user with
user ID "user-alice" to your current default project.

positional arguments:
  entity      Entity to uninvite
  project     Project to revoke permissions from

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables

Data Exploration

Earlier, I introduced dx pwd to print working directory to find my currently selected project.

$ dx pwd -h
usage: dx pwd [-h] [--env-help]

Print current working directory

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables

Notice that the output shows the project name and the directory /, which is the root directory of the project:

$ dx pwd
demo_project:/

The command dx ls will list the contents of a directory. Notice in the usage that the directory name is optional, in which case it will use the current working directory:

$ dx ls -h
usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]]
             [--env-help] [--brief | --verbose] [-a] [-l] [--obj] [--folders]
             [--full]
             [path]

List folders and/or objects in a folder

positional arguments:
  path                  Folder (possibly in another project) to list the
                        contents of, default is the current directory in the
                        current project. Syntax: projectID:/folder/path

There is nothing to list because I just created this project, so I'll add some data next.

Copying and Moving Files

I will use the command dx cp to copy a small file from one of the public projects into my project. I'll start with the usage:

usage: dx cp [-h] [--env-help] [-a] source [source ...] destination

Copy objects and/or folders between different projects.  Folders will
automatically be copied recursively.  To specify which project to use as a
source or destination, prepend the path or ID of the object/folder with the
project ID or name and a colon.

EXAMPLES

  The first example copies a file in a project called "FirstProj" to the
  current directory of the current project.  The second example copies the
  object named "reads.fq.gz" in the current directory to the folder
  /folder/path in the project with ID "project-B0VK6F6gpqG6z7JGkbqQ000Q",
  and finally renaming it to "newname.fq.gz".

  $ dx cp FirstProj:file-B0XBQFygpqGK8ZPjbk0Q000q .
  $ dx cp reads.fq.gz project-B0VK6F6gpqG6z7JGkbqQ000Q:/folder/path/newname.fq.>

positional arguments:
  source       Objects and/or folder names to copy
  destination  Folder into which to copy the sources or new pathname (if only
               one source is provided).  Must be in a different
               project/container than all source paths.

options:
  -h, --help   show this help message and exit
  --env-help   Display help message for overriding environment
               variables
  -a, --all    Apply to all results with the same name without
               prompting

The usage shows source [source …​], which is another Unix convention to indicate that the argument may be repeated. This means you can indicate several source files or directories to be copied to the final destination.

I'll copy the file hs38DH.dict from the project "Reference Genome Files: AWS US (East)" into the root directory of my new project. The command will only produce output on error:

$ dx cp project-BQpp3Y804Y0xbyG4GJPQ01xv:file-GFz5xf00Bqx2j79G4q4F5jXV /

I must specify the source file using the project and file ID. When you refer to files inside your current project, it's only necessary to use the file ID.

Now I can list the one file:

$ dx ls
hs38DH.dict

Often you'll want to use the file ID, which you can view using the -l|--long flag to see the long listing that includes more metadata:

$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

I've decided I want to create a data directory to hold files such as this, so I will use dx mkdir data. The command will produce no output on success. A new listing shows data/ where the trailing slash indicates this is a directory:

$ dx ls
data/
hs38DH.dict

To move the hs38DH.dict into the data directory, I can either use the file name or ID:

dx mv file-GFz5xf00Bqx2j79G4q4F5jXV data
dx mv hs38DH.dict data

A new listing shows that the file is no longer in the root directory:

$ dx ls
data/

I can specify the data directory to view the contents:

$ dx ls data
hs38DH.dict

Alternatively, I can use dx cd data to change directories. The command dx pwd will verify that I'm in the new folder:

$ dx pwd
demo_project:/data

If I execute dx ls now, I'll see the contents of the data directory:

$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /data
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

Return to the root directory of the project by runing dx cd or dx cd /.

Another way to inspect the structure of a project is using dx tree:

$ dx tree -h
usage: dx tree [-h] [--color {off,on,auto}] [--env-help] [-a] [-l] [path]

List folders and objects in a tree

positional arguments:
  path                  Folder (possibly in another project) to list the
                        contents of, default is the current directory in the
                        current project. Syntax: projectID:/folder/path

options:
  -h, --help            show this help message and exit
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --env-help            Display help message for overriding environment
                        variables
  -a, --all             show hidden files
  -l, --long            use a long listing format

With no options, you will see a tree structure of the project:

$ dx tree
.
└─ data
    └─ hs38DH.dict

This command will also show the long listing with -l|--long:

$ dx tree -l
.
└─ data
    └─ closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict
               (file-GFz5xf00Bqx2j79G4q4F5jXV)

Uploading Data

I want to create a local file on my computer and add it to the project. I'll use the echo command to redirect some text into a file:

$ echo hello > hello.txt

I'll use the dx upload command. The usage shows that filename is required and may be repeated.

$ dx upload -h
usage: dx upload [-h] [--visibility {hidden,visible}] [--property KEY=VALUE]
                 [--type TYPE] [--tag TAG] [--details DETAILS] [-p]
                 [--brief | --verbose] [--env-help] [--path [PATH]] [-r]
                 [--wait] [--no-progress] [--buffer-size WRITE_BUFFER_SIZE]
                 [--singlethread]
                 filename [filename ...]

Upload local file(s) or directory. If "-" is provided, stdin will be used
instead. By default, the filename will be used as its new name. If
--path/--destination is provided with a path ending in a slash, the filename
will be used, and the folder path will be used as a destination. If it does not
end in a slash, then it will be used as the final name.

positional arguments:
  filename              Local file or directory to upload ("-" indicates stdin
                        input); provide multiple times to upload multiple files
                        or directories

There are many options to the command, and here are a few to highlight:

  • --brief: Display a brief version of the return value; for most commands, prints a DNAnexus ID per line

  • -r, --recursive: Upload directories recursively

  • --path [PATH], --destination [PATH]: DNAnexus path to upload file(s) to (default uses current project and folder if not provided)

Run dx upload hello.txt and see that the new file exists in the root directory of your current project:

$ dx ls
data/
hello.txt

You can also upload data using the UI. Under the "Add" menu, you will find the following:

  • Upload Data: Use your browser to add files to the project. This is the same as using dx upload.

  • Copy Data From Project: Add data from existing projects on the platform. This is the same as dx cp.

I would like to check the new file on the platform. The dx cat command will, like the Unix cat concatenate command, print the entire contents of a file to the console:

$ dx cat -h
usage: dx cat [-h] [--env-help] [--unicode] path [path ...]

positional arguments:
  path        File ID or name(s) to print to stdout

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables
  --unicode   Display the characters as text/unicode when writing to stdout

I can use this to verify that the file was correctly uploaded:

$ dx cat hello.txt
hello

You might expect the following command to upload hello.txt into the data directory:

$ dx upload hello.txt --path data

Unfortunately, this will create a file called data alongside a directory called data:

$ dx ls
data/
data
hello.txt

I can verify that the data file contains "hello":

$ dx cat data
hello

Note this important part of upload's usage:

If --path/--destination is provided with a path ending in a slash, the
filename will be used, and the folder path will be used as a destination.
If it does not end in a slash, then it will be used as the final name.

This brings up an interesting point that file names are not unique on the DNAnexus platform. The only unique identifier is the file ID, and so this is always the best way to refer to a file. To rectify the duplication, I will get the file ID:

$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:34:31 6 bytes   data (file-GXZB2180fF65j2G1197pP7By)
closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)

I can remove the file using dx rm file-GXZB2180fF65j2G1197pP7By.

If I dx upload hello.txt file again, I will not overwrite the existing file. Rather, another copy of the file will be created with a new file ID:

$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-07 17:01:20 6 bytes   hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)

I cannot remove the file by filename as it's not unique, so I'm prompted to select which file I want:

$ dx rm hello.txt
The given path "hello.txt" resolves to the following data objects:
0) closed  2023-07-07 17:01:20 6 bytes   hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
1) closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)

Pick a numbered choice or "*" for all: 0

I used dx cat hello.txt to read the contents of the entire file because I knew the file had only one line. It's far safer to use dx head to look at just the first few lines (the default is 10):

$ dx head -h
usage: dx head [-h] [--color {off,on,auto}] [--env-help] [-n N] path

Print the first part of a file. By default, prints the first 10 lines.

positional arguments:
  path                  File ID or name to access

options:
  -h, --help            show this help message and exit
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --env-help            Display help message for overriding environment
                        variables
  -n N, --lines N       Print the first N lines (default 10)

For instance, I can peek at the data/hs38DH.dict file:

$ dx head data/hs38DH.dict
@HD VN:1.6
@SQ SN:chr1 LN:248956422    M5:6aef897c3d6ff0c78aff06ac189178dd UR:file:/home/hs38DH.fa.gz
@SQ SN:chr2 LN:242193529    M5:f98db672eb0993dcfdabafe2a882905c UR:file:/home/hs38DH.fa.gz
@SQ SN:chr3 LN:198295559    M5:76635a41ea913a405ded820447d067b0 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr4 LN:190214555    M5:3210fecf1eb92d5489da4346b3fddc6e UR:file:/home/hs38DH.fa.gz
@SQ SN:chr5 LN:181538259    M5:a811b3dc9fe66af729dc0dddf7fa4f13 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr6 LN:170805979    M5:5691468a67c7e7a7b5f2a3a683792c29 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr7 LN:159345973    M5:cc044cc2256a1141212660fb07b6171e UR:file:/home/hs38DH.fa.gz
@SQ SN:chr8 LN:145138636    M5:c67955b5f7815a9a1edfaa15893d3616 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr9 LN:138394717    M5:6c198acf68b5af7b9d676dfdd531b5de UR:file:/home/hs38DH.fa.gz

Another option to check the file is to download it:

$ dx download file-GFz5xf00Bqx2j79G4q4F5jXV
[===========================================================>]
Downloaded 342,714
[===========================================================>]
Completed 342,714 of 342,714 bytes (100%) /Users/kyclark@dnanexus.com/work/academy/hs38DH.dict

Inspecting Object Metadata

Every data object on the platform has a unique identifier prefixed with the type of object such as "file-," "record-," or "applet-." Earlier, I saw that hello.txt has the ID file-GXZB1v80fF6BXJ8p7PvZPy1v. I can use the dx describe command to view the metadata:

$ dx describe -h
usage: dx describe [-h] [--json] [--color {off,on,auto}]
                   [--delimiter [DELIMITER]] [--env-help] [--details]
                   [--verbose] [--name] [--multi]
                   path

Describe a DNAnexus entity.  Use this command to describe data objects by name
or ID, jobs, apps, users, organizations, etc.  If using the "--json" flag, it
will thrown an error if more than one match is found (but if you would like a
JSON array of the describe hashes of all matches, then provide the "--multi"
flag).  Otherwise, it will always display all results it finds.

NOTES:

- The project found in the path is used as a HINT when you are using an object ID;
you may still get a result if you have access to a copy of the object in some
other project, but if it exists in the specified project, its description will
be returned.

- When describing apps or applets, options marked as advanced inputs will be
hidden unless --verbose is provided

positional arguments:
  path                  Object ID or path to an object (possibly in another
                        project) to describe.

options:
  -h, --help            show this help message and exit
  --json                Display return value in JSON
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --delimiter [DELIMITER], --delim [DELIMITER]
                        Always use exactly one of DELIMITER to separate fields
                        to be printed; if no delimiter is provided with this
                        flag, TAB will be used
  --env-help            Display help message for overriding environment
                        variables
  --details             Include details of data objects
  --verbose             Include additional metadata
  --name                Only print the matching names, one per line
  --multi               If the flag --json is also provided, then returns a JSON
                        array of describe hashes of all matching results

I could use the filename, if it's unique, but it's always best practice to use the file ID:

$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v
Result 1:
ID                          file-GXZB1v80fF6BXJ8p7PvZPy1v
Class                       file
Project                     project-GXZ90x00fF6F4fy1K20x4gv9
Folder                      /
Name                        hello.txt
State                       closed
Visibility                  visible
Types                       -
Properties                  -
Tags                        -
Outgoing links              -
Created                     Fri Jul  7 16:34:09 2023
Created by                  kyclark
Last modified               Fri Jul  7 16:34:10 2023
Media type                  text/plain
archivalState               "live"
Size                        6 bytes
cloudAccount                "cloudaccount-dnanexus"

As shown in the usage, the --delim option causes the output table to use whatever delimiter you want between the columns. This could be useful if you wish to parse the output programmatically. The tab character is the default delimiter, but I can use a comma like so:

$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v --delim ,
Result 1:
ID,file-GXZB1v80fF6BXJ8p7PvZPy1v
Class,file
Project,project-GXZ90x00fF6F4fy1K20x4gv9
Folder,/
Name,hello.txt
State,closed
Visibility,visible
Types,-
Properties,-
Tags,-
Outgoing links,-
Created,Fri Jul  7 16:34:09 2023
Created by,kyclark
Last modified,Fri Jul  7 16:34:10 2023
Media type,text/plain
archivalState,"live"
Size,6 bytes
cloudAccount,"cloudaccount-dnanexus"

The --json flag returns the same data in JavaScript Object Notation (JSON), which we'll discuss in a later chapter:

$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v --json
{
    "id": "file-GXZB1v80fF6BXJ8p7PvZPy1v",
    "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
    "class": "file",
    "sponsored": false,
    "name": "hello.txt",
    "types": [],
    "state": "closed",
    "hidden": false,
    "links": [],
    "folder": "/",
    "tags": [],
    "created": 1688772849000,
    "modified": 1688772850572,
    "createdBy": {
        "user": "user-kyclark"
    },
    "properties": {},
    "details": {},
    "media": "text/plain",
    "archivalState": "live",
    "size": 6,
    "cloudAccount": "cloudaccount-dnanexus"
}

I can use dx describe to view the metadata associated with any object identifer on the platform. For instance, I'll use head to view the first few lines of the project's metadata:

$ dx describe project-GXZ90x00fF6F4fy1K20x4gv9 | head
Result 1:
ID                          project-GXZ90x00fF6F4fy1K20x4gv9
Class                       project
Name                        demo_project
Summary
Billed to                   org-sos
Access level                ADMINISTER
Region                      aws:us-east-1
Protected                   false
Restricted                  false

Find another entity ID, such as your billing org, to use with the command.

Copying and Moving Files

I can use dx mv to move a file or directory within a project:

$ dx mv -h
usage: dx mv [-h] [--env-help] [-a] source [source ...] destination

Move or rename data objects and/or folders inside a single project.  To copy
data between different projects, use 'dx cp' instead.

positional arguments:
  source       Objects and/or folder names to move
  destination  Folder into which to move the sources or new pathname (if only
               one source is provided).  Must be in the same project/container
               as all source paths.

options:
  -h, --help   show this help message and exit
  --env-help   Display help message for overriding environment
               variables
  -a, --all    Apply to all results with the same name without
               prompting

For instance, I can rename hello.txt to goodbye.txt with the command dx mv hello.txt goodbye.txt. The file ID remains the same:

$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-10 10:11:31 6 bytes   goodbye.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)

I can also move goodbye.txt to the data directory and rename it back to hello.txt. Again, the file ID remains the same because I have only changed some of the file's metadata:

$ dx mv file-GXZB1v80fF6BXJ8p7PvZPy1v data/hello.txt
$ dx tree -l
.
└── data
    ├── closed  2023-07-10 10:13:31 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
    └── closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

As noted in the preceeding usage, I should use dx cp to copy data from one project to another. If I attempt to copy a file within a project, I will get an error:

$ dx cp hello.txt data/hello_copy.txt
dxpy.exceptions.DXCLIError: A source path and the destination path resolved
to the same project or container. Please specify different source and
destination containers, e.g.
dx cp source-project:source-id-or-path dest-project:dest-path

The only way to make an actual copy of a file is to upload it again as I did earlier when I added the hello.txt file a second time.

Data objects on the platform exist as bits in AWS or Azure storage, and the associated metadata is stored in a DNAnexus database. If two projects are in the same region such as AWS US-East-1, then dx cp doesn't actually copy the bits but rather creates a new database entry pointing to the object. This means you don't pay for additional storage. Copying between regions, however, does make a physical copy of the bits and will cost money for data egress and storage. When in doubt, use dx describe <project-id> to see a project's "Region" attribute or check the "Settings" in the project view UI.

Finding Data

The dx find command will help you search for entities including:

  • apps

  • globalworkflows

  • jobs

  • data

  • projects

  • orgs

  • org members

  • org projects

  • org apps

I can use the dx find data command to search data objects such as files and applets. I'll display the first part of the usage as it's rather long:

usage: dx find data [-h] [--brief | --verbose] [--json]
                    [--color {off,on,auto}] [--delimiter [DELIMITER]]
                    [--env-help] [--property KEY[=VALUE]] [--tag TAG]
                    [--class {record,file,applet,workflow,database}]
                    [--state {open,closing,closed,any}]
                    [--visibility {hidden,visible,either}] [--name NAME]
                    [--type TYPE] [--link LINK] [--all-projects]
                    [--path PROJECT:FOLDER] [--norecurse]
                    [--created-after CREATED_AFTER]
                    [--created-before CREATED_BEFORE] [--mod-after MOD_AFTER]
                    [--mod-before MOD_BEFORE] [--region REGION]

Finds data objects subject to the given search parameters. By default,
restricts the search to the current project if set. To search over all
projects (excluding public projects), use --all-projects (overrides --path and
--norecurse).

Run the command in the current project to see the two files:

$ dx find data
closed  2023-07-10 10:13:31 6 bytes   /data/hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

I can use the --name option to look for a file by name:

$ dx find data --name hs38DH.dict
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

I can also specify a Unix file glob pattern, such as all files that begin with h:

$ dx find data --name "h*"
closed  2023-07-10 10:13:31 6 bytes   /data/hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

Or all files that end with .dict. Note in this example that the asterisk is escapted with a backslash to prevent my shell from exanding it locally as I want the literal star to be given as the argument:

$ dx find data --name \*.dict
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)

The --brief flag will return only the file ID:

$ dx find data --name \*.dict --brief
project-GXZ90x00fF6F4fy1K20x4gv9:file-GFz5xf00Bqx2j79G4q4F5jXV

This is useful, for instance, for downloading a file:

$ dx download $(dx find data --name \*.dict --brief)
[=======================>] Completed 342,714 of 342,714 bytes (100%)
                           /Users/kyclark@dnanexus.com/work/academy/hs38DH.dict

The --json flag will return the results in JSON format. In the JSON chapter, you will learn how to parse these results for more advanced querying and data manipulation:

$ dx find data --name \*.dict --json
[
    {
        "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
        "id": "file-GFz5xf00Bqx2j79G4q4F5jXV",
        "describe": {
            "id": "file-GFz5xf00Bqx2j79G4q4F5jXV",
            "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
            "class": "file",
            "name": "hs38DH.dict",
            "state": "closed",
            "folder": "/data",
            "modified": 1688771516882,
            "size": 342714
        }
    }
]

The --class option accepts the following values:

  • applet

  • database

  • file

  • record

  • workflow

The --state options accepts the following values:

  • open: A file that is currently being uploaded

  • closing: A file that is done uploading but is still being validated

  • closed: A file that is uploaded and validated

  • any: any of the above

There are many more options for finding data and other entities on the platform that will be covered in later chapters.

Running Jobs

It's time to run an app, but which one? I'd like to have a FASTQ file to work with, so I'll start by using the SRA FASTQ Importer. I can never quite remember the name of the app, so I'll search for it using a wildcard:

$ dx find apps --name "sra*"
x SRA FASTQ Importer (sra_fastq_importer), v4.0.0

The "x" in the first column indicates this is an app supported by DNAnexus.

I can find information about the inputs and outputs to the app using either of these commands:

  • dx describe sra_fastq_importer

  • dx run sra_fastq_importer -h

I prefer the output from the second command:

$ dx run sra_fastq_importer -h
usage: dx run sra_fastq_importer [-iINPUT_NAME=VALUE ...]

App: SRA FASTQ Importer

Version: 4.0.0 (published)

Download SE or PE reads in FASTQ or FASTA format from SRA using SRR accessions

See the app page for more information:
  https://platform.dnanexus.com/app/sra_fastq_importer

Inputs:
  dbGaP Repository key: [-ingc_key=(file)]
        (Optional) Security token required for configuring NCBI SRA toolkit and decryption tools.

  SRR Accession: -iaccession=(string)
        Single SRR accession to fetch.
$ dx run sra_fastq_importer -iaccession=SRR070372

Using input JSON:
{
    "accession": "SRR070372"
}

Confirm running the executable with this input [Y/n]: y
Calling app-G49BFZ093qKvjFYgF8fyv6Z7 with output destination project-GXY0PK0071xJpG156BFyXpJF:/

Job ID: job-GXf8Qg8071xBJJg417YVYJX3
Watch launched job now? [Y/n] y

The equal sign in -iaccession=SRR070372 is required.

The output of watching is the same as you would see from the UI if you click the "MONITOR" tab in the project view and then "View Log" while the app is running. The end of the watch shows the app ran successfully and that a new file was created in my project:

* SRA FASTQ Importer (sra_fastq_importer:main) (done)
  job-GXf8Qg8071xBJJg417YVYJX3
  kyclark 2023-07-10 15:38:21 (runtime 0:02:36)
  Output: single_reads_fastq = [ file-GXf8VgQ09bzK5q1XV5z1gx7j ]

I can find the size of the file with dx ls:

$ dx ls -l file-GXf8VgQ09bzK5q1XV5z1gx7j
closed  2023-07-10 15:41:38 206.59 MB SRR070372.fastq.gz (file-GXf8VgQ09bzK5q1XV5z1gx7j)

Now I'd like to run this into FastQC. I'll search for the app by name just to be sure, and, yes, it's called "fastqc":

$ dx find apps --name fastqc
x FastQC Reads Quality Control (fastqc), v3.0.3

Again, I use either dx describe or dx run to see that the app requires

usage: dx run fastqc [-iINPUT_NAME=VALUE ...]

App: FastQC Reads Quality Control

Version: 3.0.3 (published)

Generates a QC report on reads data

See the app page for more information:
  https://platform.dnanexus.com/app/fastqc

Inputs:
  Reads: -ireads=(file)
        A file containing the reads to be checked. Accepted formats are
        gzipped-FASTQ and BAM.

I will use the new file's ID as the input to FastQC, and I'll run it using the additional flags -y to confirm launching and --watch to immediately start watching the job:

$ dx run fastqc -ireads=file-GXf8P880FjgZGJQqx8Bf30YK -y --watch

Using input JSON:
{
    "reads": {
        "$dnanexus_link": "file-GXf8P880FjgZGJQqx8Bf30YK"
    }
}

Calling app-G81jg5j9jP7qxb310vg2xQkX with output destination project-GXY0PK0071xJpG156BFyXpJF:/

Job ID: job-GXf8fJQ071x00P5bQzQ62gjY

Notice that the confirmation shows "Using input JSON". If you like, you can save that to a file called, for example, input.json:

$ cat input.json
{
    "reads": {
        "$dnanexus_link": "file-GXf8P880FjgZGJQqx8Bf30YK"
    }
}

I can then launch the job using the -f|--input-json-file argument along with the --brief flag to show only the resulting job ID:

$ dx run fastqc -f input.json -y --brief
job-GXf930j071xJfYqfJ2kkvk8v

Since the output will be the same, I can kill the job using dx terminate job-GXf930j071xJfYqfJ2kkvk8v.

The end of the watch shows that the job finishes successfully:

* FastQC Reads Quality Control (fastqc:main) (done) job-GXf8fgj071x3KV4qyyKGZQVY
  kyclark 2023-07-10 15:51:11 (runtime 0:02:01)
  Output: report_html = file-GXf8gbQ06GxZ38zFXB46XYYj
          stats_txt = file-GXf8gbj06Gxy9F8P66pJG7J3

I would like to get a feel for the output, so I'll use dx head on the stats_txt output file ID:

$ dx head file-GXf8gbj06Gxy9F8P66pJG7J3
##FastQC    0.11.9
>>Basic Statistics    pass
#Measure    Value
Filename    SRR070372.fastq.gz
File type   Conventional base calls
Encoding    Sanger / Illumina 1.9
Total Sequences 498843
Sequences flagged as poor quality   0
Sequence length 48-2044
%GC 39

Review

You are now able to:

  • List the advantages to interacting with platform via command line interface

  • List the functions of the SDK and the API

  • Describe the purpose of the dx-toolkit

  • Apply frequently used dx-toolkit commands to execute common use cases, applicable to a broad audience of users

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

API (application programming interface) Servers are used for us to interact with the Platform using HTTP requests. The arguments for this request are fields in a JSON file. If you want more details on this structure, you can go to .

Further details can be found in our if you need it.

Information on setting up tokens can be found in the section of our Documentation.

I will use this command to create a new project in the AWS US-East-1 region. See the documentation for a list of . The command displays the new project ID and prompts to switch into the new project:

Add Data From Server: Add data from any publicly accessible URL such as an HTTP or FTP site. This is the same as running the app.

Import From AWS S3: Add data from an S3 bucket. This is the same as running the app.

In addition, we offer an app.

The concept of immutability was covered in "Course 101 Overview of the DNA nexus Platfrom USer Interface": Remember the crucially important fact that data objects on the DNAnexus platform are immutable. They can only be created (e.g., by uploading them) or removed, but they can never be overwritten. A given object ID always points to the same collection of bits, which leads to downstream benefits like reusing the outputs of jobs that share the same executable and input IDs ().

Looking at the usage for the app, I see that only the -iaccession argument is required as all the others are shown enclosed with square brackets, e.g., [-ingc_key=(file)]. I can run the app the SRA accession (C. elegans), answering "yes" to both launching and watching the app:

DNAnexus API
Documentation
Using Tokens
all available regions
URL Fetcher
AWS S3 Importer
SRA FASTQ Importer
smart reuse
SRR070372
Full Documentation