# Introduction to CLI

## Overview of Interacting with the Platform

Users of the platform like to interact with it in a variety of ways (shown below), but this section is dedicated to those that want to learn how to interact with it using the command line, or CLI.

![](/files/NSfvCffS1Wb9QOYZjXSC)

## Terms

The CLI interacts with the platform in the following way:

![](/files/b1Sglv9kEe2IBiaQWtmf)

* The CLI (command line interface) is run locally on your own machine.
* On your local machine, you will download the SDK (software development kit), which we also call dx-toolkit. Information on downloading it and other requirements is found in the Getting Started Guide. Once set up, this allows you to log into the platform and explore your data/ projects, create apps and workflows, and launch analyses.
* API (application programming interface) Servers are used for us to interact with the Platform using HTTP requests. The arguments for this request are fields in a JSON file. If you want more details on this structure, you can go to [DNAnexus API](https://documentation.dnanexus.com/developer/api).

## Installation

Please ensure that you are running Python 3 before starting this install.

To install:

```
pip3 install dxpy
```

To upgrade dxpy

```
pip3 install –upgrade dxpy
```

Further details can be found in our [Documentation](https://documentation.dnanexus.com/downloads) if you need it.

## Introducing dx-toolkit

The `dx` command will be your most used utility for interacting with the DNAnexus platform. You can run the command with no arguments or with the `-h` or `--help` flags to see the usage:

```
usage: dx [-h] [--version] command ...

DNAnexus Command-Line Client, API v1.0.0, client v0.346.0

dx is a command-line client for interacting with the DNAnexus platform.  You
can log in, navigate, upload, organize and share your data, launch analyses,
and more.  For a quick tour of what the tool can do, see

  https://documentation.dnanexus.com/getting-started/tutorials/cli-quickstart#q>

For a breakdown of dx commands by category, run "dx help".

dx exits with exit code 3 if invalid input is provided or an invalid operation
is requested, and exit code 1 if an internal error is encountered.  The latter
usually indicate bugs in dx; please report them at

  https://github.com/dnanexus/dx-toolkit/issues

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment
              variables
  --version   show program's version number and exit
```

Sometime the usage make occupy your entire terminal, in which case you may see `(END)` to show that you are at the end of the documentation. Press `q` to *quit* the usage, or use the universal `Ctrl-C` to send an interrupt signal to the process to kill it.

Run **`dx help`** to read about the categories of commands you can run:

```
$ dx help
usage: dx help [-h] [command_or_category] [subcommand]

Displays the help message for the given command (and subcommand if given), or
displays the list of all commands in the given category.

CATEGORIES

  all       All commands
  session   Manage your login session
  fs        Navigate and organize your projects and files
  data      View, download, and upload data
  metadata  View and modify metadata for projects, data, and executions
  workflow  View and modify workflows
  exec      Manage and run apps, applets, and workflows
  org       Administer and operate on orgs
  other     Miscellaneous advanced utilities
```

## Logging Into the Platform

Let's start by using **`dx login`** to gain access to the DNAnexus platform from the command line. All `dx` commands will respond to `-h|--help`, so run the command with one of these flags to read the usage:

```
$ dx login -h
usage: dx login [-h] [--env-help] [--token TOKEN] [--noprojects] [--save]
                [--timeout TIMEOUT]

Log in interactively and acquire credentials. Use "--token" to log in with an
existing API token.

options:
  -h, --help         show this help message and exit
  --env-help         Display help message for overriding environment variables
  --token TOKEN      Authentication token to use
  --noprojects       Do not print available projects
  --save             Save token and other environment variables for future
                     sessions
  --timeout TIMEOUT  Timeout for this login token (in seconds, or use suffix
                     s, m, h, d, w, M, y)
```

The help documentation is often called the *usage* because that is often the first word of the output. In the previous output, notice that the all the arguments are enclosed in square brackets, e.g., `[--token TOKEN]`. This is a common convention in Unix documentation to indicate that the argument is optional. The lack of such square brackets means the argument is required.

Some of the arguments require a value to follow. For example, `--token TOKEN` means the argument `--token` must be followed by the string value for the token. Arguments like `--save` are known as *flags*. They are either present or not and often represent a Boolean value, usually "True" when present and "False" when absent.

The most basic usage for login is to enter your username and password when prompted:

```
$ dx login
Acquiring credentials from https://auth.dnanexus.com
Username: XXXXXXXX
Password: XXXXXXXX
```

TODO: Reasons for using tokens, security, dangers. You may also generate a token in the web UI for use on the command line:

```
$ dx login --token xxxxxxxxxxx
```

Information on setting up tokens can be found in the [Using Tokens](https://documentation.dnanexus.com/user/login-and-logout#using-tokens) section of our Documentation.

Use **`dx logout`** to log out of the platform. *This invalidates a token.*

If you are ever in doubt of your username, use **`dx whoami`** to see your identity.

* When you ssh into a cloud workstation, you will be your normal DNAnexus user.
* When running the `ttyd` app to access a cloud workstation through the UI, you will be the privileged Unix user *root*.
* When you ssh into a running job, you will be the user *dnanexus*.

## Working with Projects and Users

A *project* is the smallest unit of sharing in DNAnexus, and you must always work in the context of a project. Upon login, you will be prompted to select a project. To change projects, use **`dx select`**. Use `-h|--help` to view the usage:

```
$ dx select -h
usage: dx select [-h] [--env-help] [--name NAME]
                 [--level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}] [--public]
                 [project]

Interactively list and select a project to switch to. By default, only lists
projects for which you have at least CONTRIBUTE permissions. Use --public to
see the list of public projects.

positional arguments:
  project               Name or ID of a project to switch to; if not provided
                        a list will be provided for you

options:
  -h, --help            show this help message and exit
  --env-help            Display help message for overriding environment
                        variables
  --name NAME           Name of the project (wildcard patterns supported)
  --level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
                        Minimum level of permissions expected
  --public              Include ONLY public projects (will automatically set
                        --level to VIEW)
```

When run with no options, you will be presented a list of your projects and privilege:

```
$ dx select

Note: Use dx select --level VIEW or dx select --public to
select from projects for which you only have VIEW permissions.

Available projects (CONTRIBUTE or higher):
0) App Dev (ADMINISTER)
1) Methylation (ADMINISTER)
2) Genomes (ADMINISTER)
3) WTS (ADMINISTER)
4) WGS (ADMINISTER)
5) Exome (ADMINISTER)
6) QC (ADMINISTER)
7) Collaborators (ADMINISTER)
8) Pipeline Dev (ADMINISTER)
9) WDL Test (ADMINISTER)
m) More options not shown...

Pick a numbered choice or "m" for more options [0]:
```

Press Enter to choose the first project, or select a number 0-9 to choose a project or *m* for "more" options. You can also provide a project name or ID as the first argument:

```
$ dx select project-XXXXXXXXXXXXXXXXXXXXXXXX
$ dx select "Pipeline Dev"
```

Use the `--level` option to specify only projects where you have a particular permission. For instance, **`dx select --level ADMINISTER`** will show only projects where you are an administrator.

Normally, projects are private to your organization, but the `--public` option will display the public projects that DNAnexus uses to share common resources like sequence files or indexes for reference genomes:

```
$ dx select --public

Available public projects:
0) Reference Genome Files: Azure US (West) (VIEW)
1) App_Assets_Europe(London)_Internal (VIEW)
2) Reference Genome Files: Azure Amsterdam (VIEW)
3) Reference Genome Files: AWS Germany (VIEW)
4) Reference Genome Files: AWS US (East) (VIEW)
5) Reference Genome Files: AWS Europe (London) (VIEW)
6) App and Applet Assets Azure (VIEW)
7) dxCompiler_Europe_London (VIEW)
8) dxCompiler_Sydney (VIEW)
9) dxCompiler_Berlin (VIEW)
m) More options not shown...

Pick a numbered choice or "m" for more options:
```

Press `Ctrl-C` to exit the program *without* making a selection.

If you are ever in doubt as to your current project, run **`dx pwd`** (*print working directory*):

```
$ dx pwd
Pipeline Dev:/
```

Alternatively, you can run **`dx env`** to see your current environment:

```
$ dx env
Auth token used         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
API server protocol     https
API server host         api.dnanexus.com
API server port         443
Current workspace       project-XXXXXXXXXXXXXXXXXXXXXXXX
Current workspace name  "Pipeline Dev"
Current folder          /
Current user            test_user
```

If I wanted to share some data with a collaborator, I would use **`dx new project`** to create a new project to hold select data and apps. Following is the usage:

```
$ dx new project -h
usage: dx new project [-h] [--brief | --verbose] [--env-help]
                      [--region REGION] [-s] [--bill-to BILL_TO] [--phi]
                      [--database-ui-view-only]
                      [name]

Create a new project

positional arguments:
  name                  Name of the new project

options:
  -h, --help            show this help message and exit
  --brief               Display a brief version of the return value; for most
                        commands, prints a DNAnexus ID per line
  --verbose             If available, displays extra verbose output
  --env-help            Display help message for overriding environment
                        variables
  --region REGION       Region affinity of the new project
  -s, --select          Select the new project as current after creating
  --bill-to BILL_TO     ID of the user or org to which the project will be
                        billed. The default value is the billTo of the
                        requesting user.
  --phi                 Add PHI protection to project
  --database-ui-view-only
                        Viewers on the project cannot access database data
                        directly
```

I will use this command to create a new project in the AWS US-East-1 region. See the documentation for a list of [all available regions](https://documentation.dnanexus.com/developer/api/regions). The command displays the new project ID and prompts to switch into the new project:

```
$ dx new project --region aws:us-east-1 demo_project
Created new project called "demo_project" (project-GXZ90x00fF6F4fy1K20x4gv9)
Switch to new project now? [y/N]: y
```

Next, I would use **`dx invite <user-id>`** to invite users to the project. Start with the usage to see how to call the command:

```
$ dx invite -h
usage: dx invite [-h] [--env-help] [--no-email]
                 invitee [project] [{VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}]

Invite a DNAnexus entity to a project. If the invitee is not recognized as a
DNAnexus ID, it will be treated as a username, i.e. "dx invite alice : VIEW"
is equivalent to inviting the user with user ID "user-alice" to view your
current default project.

positional arguments:
  invitee               Entity to invite
  project               Project to invite the invitee to
  {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
                        Permissions level the new member should have

options:
  -h, --help            show this help message and exit
  --env-help            Display help message for overriding environment
                        variables
  --no-email            Disable email notifications to invitee
```

The usage to see that this command includes three *positional arguments*, the first of which (*invitee*) is required and the other two (*project*, *permissions*) are optional. Your currently selected project is the default project, and "VIEW" is the default permission. If you wish to indicate some permission other than "VIEW," you must specify the project first.

Use **`dx uninvite <user-id>`** to revoke a user's access to a project:

```
$ dx uninvite -h
usage: dx uninvite [-h] [--env-help] entity [project]

Revoke others' permissions on a project you administer. If the entity is not
recognized as a DNAnexus ID, it will be treated as a username, i.e. "dx
uninvite alice :" is equivalent to revoking the permissions of the user with
user ID "user-alice" to your current default project.

positional arguments:
  entity      Entity to uninvite
  project     Project to revoke permissions from

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables
```

## Data Exploration

Earlier, I introduced **`dx pwd`** to *print working directory* to find my currently selected project.

```
$ dx pwd -h
usage: dx pwd [-h] [--env-help]

Print current working directory

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables
```

Notice that the output shows the project name and the directory `/`, which is the root directory of the project:

```
$ dx pwd
demo_project:/
```

The command **`dx ls`** will *list* the contents of a directory. Notice in the usage that the directory name is optional, in which case it will use the current working directory:

```
$ dx ls -h
usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]]
             [--env-help] [--brief | --verbose] [-a] [-l] [--obj] [--folders]
             [--full]
             [path]

List folders and/or objects in a folder

positional arguments:
  path                  Folder (possibly in another project) to list the
                        contents of, default is the current directory in the
                        current project. Syntax: projectID:/folder/path
```

There is nothing to list because I just created this project, so I'll add some data next.

## Copying and Moving Files

I will use the command **`dx cp`** to *copy* a small file from one of the public projects into my project. I'll start with the usage:

```
usage: dx cp [-h] [--env-help] [-a] source [source ...] destination

Copy objects and/or folders between different projects.  Folders will
automatically be copied recursively.  To specify which project to use as a
source or destination, prepend the path or ID of the object/folder with the
project ID or name and a colon.

EXAMPLES

  The first example copies a file in a project called "FirstProj" to the
  current directory of the current project.  The second example copies the
  object named "reads.fq.gz" in the current directory to the folder
  /folder/path in the project with ID "project-B0VK6F6gpqG6z7JGkbqQ000Q",
  and finally renaming it to "newname.fq.gz".

  $ dx cp FirstProj:file-B0XBQFygpqGK8ZPjbk0Q000q .
  $ dx cp reads.fq.gz project-B0VK6F6gpqG6z7JGkbqQ000Q:/folder/path/newname.fq.>

positional arguments:
  source       Objects and/or folder names to copy
  destination  Folder into which to copy the sources or new pathname (if only
               one source is provided).  Must be in a different
               project/container than all source paths.

options:
  -h, --help   show this help message and exit
  --env-help   Display help message for overriding environment
               variables
  -a, --all    Apply to all results with the same name without
               prompting
```

The usage shows `source [source …​]`, which is another Unix convention to indicate that the argument may be repeated. This means you can indicate several source files or directories to be copied to the final `destination`.

I'll copy the file *hs38DH.dict* from the project "Reference Genome Files: AWS US (East)" into the root directory of my new project. The command will only produce output on error:

```
$ dx cp project-BQpp3Y804Y0xbyG4GJPQ01xv:file-GFz5xf00Bqx2j79G4q4F5jXV /
```

I must specify the source file using the project and file ID. When you refer to files inside your current project, it's only necessary to use the file ID.

Now I can list the one file:

```
$ dx ls
hs38DH.dict
```

Often you'll want to use the file ID, which you can view using the `-l|--long` flag to see the *long* listing that includes more metadata:

```
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

I've decided I want to create a *data* directory to hold files such as this, so I will use **`dx mkdir data`**. The command will produce no output on success. A new listing shows `data/` where the trailing slash indicates this is a directory:

```
$ dx ls
data/
hs38DH.dict
```

To *move* the *hs38DH.dict* into the *data* directory, I can either use the file name or ID:

```
dx mv file-GFz5xf00Bqx2j79G4q4F5jXV data
dx mv hs38DH.dict data
```

A new listing shows that the file is no longer in the root directory:

```
$ dx ls
data/
```

I can specify the *data* directory to view the contents:

```
$ dx ls data
hs38DH.dict
```

Alternatively, I can use **`dx cd data`** to *change directories*. The command **`dx pwd`** will verify that I'm in the new folder:

```
$ dx pwd
demo_project:/data
```

If I execute **`dx ls`** now, I'll see the contents of the *data* directory:

```
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /data
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

Return to the root directory of the project by runing **`dx cd`** or **`dx cd /`**.

Another way to inspect the structure of a project is using **`dx tree`**:

```
$ dx tree -h
usage: dx tree [-h] [--color {off,on,auto}] [--env-help] [-a] [-l] [path]

List folders and objects in a tree

positional arguments:
  path                  Folder (possibly in another project) to list the
                        contents of, default is the current directory in the
                        current project. Syntax: projectID:/folder/path

options:
  -h, --help            show this help message and exit
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --env-help            Display help message for overriding environment
                        variables
  -a, --all             show hidden files
  -l, --long            use a long listing format
```

With no options, you will see a tree structure of the project:

```
$ dx tree
.
└─ data
    └─ hs38DH.dict
```

This command will also show the *long* listing with `-l|--long`:

```
$ dx tree -l
.
└─ data
    └─ closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict
               (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

## Uploading Data

I want to create a local file on my computer and add it to the project. I'll use the `echo` command to redirect some text into a file:

```
$ echo hello > hello.txt
```

I'll use the `dx upload` command. The usage shows that filename is required and may be repeated.

```
$ dx upload -h
usage: dx upload [-h] [--visibility {hidden,visible}] [--property KEY=VALUE]
                 [--type TYPE] [--tag TAG] [--details DETAILS] [-p]
                 [--brief | --verbose] [--env-help] [--path [PATH]] [-r]
                 [--wait] [--no-progress] [--buffer-size WRITE_BUFFER_SIZE]
                 [--singlethread]
                 filename [filename ...]

Upload local file(s) or directory. If "-" is provided, stdin will be used
instead. By default, the filename will be used as its new name. If
--path/--destination is provided with a path ending in a slash, the filename
will be used, and the folder path will be used as a destination. If it does not
end in a slash, then it will be used as the final name.

positional arguments:
  filename              Local file or directory to upload ("-" indicates stdin
                        input); provide multiple times to upload multiple files
                        or directories
```

There are many options to the command, and here are a few to highlight:

* `--brief`: Display a brief version of the return value; for most commands, prints a DNAnexus ID per line
* `-r, --recursive`: Upload directories recursively
* `--path [PATH], --destination [PATH]`: DNAnexus path to upload file(s) to (default uses current project and folder if not provided)

Run **`dx upload hello.txt`** and see that the new file exists in the root directory of your current project:

```
$ dx ls
data/
hello.txt
```

You can also upload data using the UI. Under the "Add" menu, you will find the following:

* **Upload Data**: Use your browser to add files to the project. This is the same as using `dx upload`.
* **Copy Data From Project**: Add data from existing projects on the platform. This is the same as `dx cp`.
* **Add Data From Server**: Add data from any publicly accessible URL such as an HTTP or FTP site. This is the same as running the [URL Fetcher](https://platform.dnanexus.com/app/url_fetcher) app.
* **Import From AWS S3**: Add data from an S3 bucket. This is the same as running the [AWS S3 Importer](https://platform.dnanexus.com/app/aws_s3_to_platform_files) app.

In addition, we offer an [SRA FASTQ Importer](https://platform.dnanexus.com/app/sra_fastq_importer) app.

I would like to check the new file on the platform. The `dx cat` command will, like the Unix `cat` *concatenate* command, print the entire contents of a file to the console:

```
$ dx cat -h
usage: dx cat [-h] [--env-help] [--unicode] path [path ...]

positional arguments:
  path        File ID or name(s) to print to stdout

options:
  -h, --help  show this help message and exit
  --env-help  Display help message for overriding environment variables
  --unicode   Display the characters as text/unicode when writing to stdout
```

I can use this to verify that the file was correctly uploaded:

```
$ dx cat hello.txt
hello
```

You might expect the following command to upload *hello.txt* into the *data* directory:

```
$ dx upload hello.txt --path data
```

Unfortunately, this will create a **file** called *data* alongside a **directory** called *data*:

```
$ dx ls
data/
data
hello.txt
```

I can verify that the *data* file contains "hello":

```
$ dx cat data
hello
```

Note this important part of upload's usage:

```
If --path/--destination is provided with a path ending in a slash, the
filename will be used, and the folder path will be used as a destination.
If it does not end in a slash, then it will be used as the final name.
```

This brings up an interesting point that file names are not unique on the DNAnexus platform. The only unique identifier is the file ID, and so this is always the best way to refer to a file. To rectify the duplication, I will get the file ID:

```
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-07 16:34:31 6 bytes   data (file-GXZB2180fF65j2G1197pP7By)
closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
```

I can *remove* the file using **`dx rm file-GXZB2180fF65j2G1197pP7By`**.

If I **`dx upload hello.txt`** file again, I will not overwrite the existing file. Rather, another copy of the file will be created with a new file ID:

```
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-07 17:01:20 6 bytes   hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
```

The concept of immutability was covered in "Course 101 Overview of the DNA nexus Platfrom USer Interface": Remember the crucially important fact that data objects on the DNAnexus platform are *immutable*. They can only be created (e.g., by uploading them) or removed, but they can never be overwritten. A given object ID always points to the same collection of bits, which leads to downstream benefits like reusing the outputs of jobs that share the same executable and input IDs ([smart reuse](https://documentation.dnanexus.com/user/running-apps-and-workflows/job-reuse)).

I cannot remove the file by filename as it's not unique, so I'm prompted to select which file I want:

```
$ dx rm hello.txt
The given path "hello.txt" resolves to the following data objects:
0) closed  2023-07-07 17:01:20 6 bytes   hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
1) closed  2023-07-07 16:34:10 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)

Pick a numbered choice or "*" for all: 0
```

I used **`dx cat hello.txt`** to read the contents of the entire file because I knew the file had only one line. It's far safer to use `dx head` to look at just the first few lines (the default is 10):

```
$ dx head -h
usage: dx head [-h] [--color {off,on,auto}] [--env-help] [-n N] path

Print the first part of a file. By default, prints the first 10 lines.

positional arguments:
  path                  File ID or name to access

options:
  -h, --help            show this help message and exit
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --env-help            Display help message for overriding environment
                        variables
  -n N, --lines N       Print the first N lines (default 10)
```

For instance, I can peek at the *data/hs38DH.dict* file:

```
$ dx head data/hs38DH.dict
@HD VN:1.6
@SQ SN:chr1 LN:248956422    M5:6aef897c3d6ff0c78aff06ac189178dd UR:file:/home/hs38DH.fa.gz
@SQ SN:chr2 LN:242193529    M5:f98db672eb0993dcfdabafe2a882905c UR:file:/home/hs38DH.fa.gz
@SQ SN:chr3 LN:198295559    M5:76635a41ea913a405ded820447d067b0 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr4 LN:190214555    M5:3210fecf1eb92d5489da4346b3fddc6e UR:file:/home/hs38DH.fa.gz
@SQ SN:chr5 LN:181538259    M5:a811b3dc9fe66af729dc0dddf7fa4f13 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr6 LN:170805979    M5:5691468a67c7e7a7b5f2a3a683792c29 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr7 LN:159345973    M5:cc044cc2256a1141212660fb07b6171e UR:file:/home/hs38DH.fa.gz
@SQ SN:chr8 LN:145138636    M5:c67955b5f7815a9a1edfaa15893d3616 UR:file:/home/hs38DH.fa.gz
@SQ SN:chr9 LN:138394717    M5:6c198acf68b5af7b9d676dfdd531b5de UR:file:/home/hs38DH.fa.gz
```

Another option to check the file is to download it:

```
$ dx download file-GFz5xf00Bqx2j79G4q4F5jXV
[===========================================================>]
Downloaded 342,714
[===========================================================>]
Completed 342,714 of 342,714 bytes (100%) /Users/kyclark@dnanexus.com/work/academy/hs38DH.dict
```

## Inspecting Object Metadata

Every data object on the platform has a unique identifier prefixed with the type of object such as "file-," "record-," or "applet-." Earlier, I saw that *hello.txt* has the ID `file-GXZB1v80fF6BXJ8p7PvZPy1v`. I can use the `dx describe` command to view the metadata:

```
$ dx describe -h
usage: dx describe [-h] [--json] [--color {off,on,auto}]
                   [--delimiter [DELIMITER]] [--env-help] [--details]
                   [--verbose] [--name] [--multi]
                   path

Describe a DNAnexus entity.  Use this command to describe data objects by name
or ID, jobs, apps, users, organizations, etc.  If using the "--json" flag, it
will thrown an error if more than one match is found (but if you would like a
JSON array of the describe hashes of all matches, then provide the "--multi"
flag).  Otherwise, it will always display all results it finds.

NOTES:

- The project found in the path is used as a HINT when you are using an object ID;
you may still get a result if you have access to a copy of the object in some
other project, but if it exists in the specified project, its description will
be returned.

- When describing apps or applets, options marked as advanced inputs will be
hidden unless --verbose is provided

positional arguments:
  path                  Object ID or path to an object (possibly in another
                        project) to describe.

options:
  -h, --help            show this help message and exit
  --json                Display return value in JSON
  --color {off,on,auto}
                        Set when color is used (color=auto is used when stdout
                        is a TTY)
  --delimiter [DELIMITER], --delim [DELIMITER]
                        Always use exactly one of DELIMITER to separate fields
                        to be printed; if no delimiter is provided with this
                        flag, TAB will be used
  --env-help            Display help message for overriding environment
                        variables
  --details             Include details of data objects
  --verbose             Include additional metadata
  --name                Only print the matching names, one per line
  --multi               If the flag --json is also provided, then returns a JSON
                        array of describe hashes of all matching results
```

I could use the filename, if it's unique, but it's always best practice to use the file ID:

```
$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v
Result 1:
ID                          file-GXZB1v80fF6BXJ8p7PvZPy1v
Class                       file
Project                     project-GXZ90x00fF6F4fy1K20x4gv9
Folder                      /
Name                        hello.txt
State                       closed
Visibility                  visible
Types                       -
Properties                  -
Tags                        -
Outgoing links              -
Created                     Fri Jul  7 16:34:09 2023
Created by                  kyclark
Last modified               Fri Jul  7 16:34:10 2023
Media type                  text/plain
archivalState               "live"
Size                        6 bytes
cloudAccount                "cloudaccount-dnanexus"
```

As shown in the usage, the `--delim` option causes the output table to use whatever delimiter you want between the columns. This could be useful if you wish to parse the output programmatically. The tab character is the default delimiter, but I can use a comma like so:

```
$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v --delim ,
Result 1:
ID,file-GXZB1v80fF6BXJ8p7PvZPy1v
Class,file
Project,project-GXZ90x00fF6F4fy1K20x4gv9
Folder,/
Name,hello.txt
State,closed
Visibility,visible
Types,-
Properties,-
Tags,-
Outgoing links,-
Created,Fri Jul  7 16:34:09 2023
Created by,kyclark
Last modified,Fri Jul  7 16:34:10 2023
Media type,text/plain
archivalState,"live"
Size,6 bytes
cloudAccount,"cloudaccount-dnanexus"
```

The `--json` flag returns the same data in JavaScript Object Notation (JSON), which we'll discuss in a later chapter:

```
$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v --json
{
    "id": "file-GXZB1v80fF6BXJ8p7PvZPy1v",
    "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
    "class": "file",
    "sponsored": false,
    "name": "hello.txt",
    "types": [],
    "state": "closed",
    "hidden": false,
    "links": [],
    "folder": "/",
    "tags": [],
    "created": 1688772849000,
    "modified": 1688772850572,
    "createdBy": {
        "user": "user-kyclark"
    },
    "properties": {},
    "details": {},
    "media": "text/plain",
    "archivalState": "live",
    "size": 6,
    "cloudAccount": "cloudaccount-dnanexus"
}
```

I can use `dx describe` to view the metadata associated with any object identifer on the platform. For instance, I'll use `head` to view the first few lines of the project's metadata:

```
$ dx describe project-GXZ90x00fF6F4fy1K20x4gv9 | head
Result 1:
ID                          project-GXZ90x00fF6F4fy1K20x4gv9
Class                       project
Name                        demo_project
Summary
Billed to                   org-sos
Access level                ADMINISTER
Region                      aws:us-east-1
Protected                   false
Restricted                  false
```

Find another entity ID, such as your billing org, to use with the command.

## Copying and Moving Files

I can use `dx mv` to *move* a file or directory within a project:

```
$ dx mv -h
usage: dx mv [-h] [--env-help] [-a] source [source ...] destination

Move or rename data objects and/or folders inside a single project.  To copy
data between different projects, use 'dx cp' instead.

positional arguments:
  source       Objects and/or folder names to move
  destination  Folder into which to move the sources or new pathname (if only
               one source is provided).  Must be in the same project/container
               as all source paths.

options:
  -h, --help   show this help message and exit
  --env-help   Display help message for overriding environment
               variables
  -a, --all    Apply to all results with the same name without
               prompting
```

For instance, I can rename *hello.txt* to *goodbye.txt* with the command **`dx mv hello.txt goodbye.txt`**. The file ID remains the same:

```
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State   Last modified       Size      Name (ID)
closed  2023-07-10 10:11:31 6 bytes   goodbye.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
```

I can also move *goodbye.txt* to the *data* directory and rename it back to *hello.txt*. Again, the file ID remains the same because I have only changed some of the file's metadata:

```
$ dx mv file-GXZB1v80fF6BXJ8p7PvZPy1v data/hello.txt
$ dx tree -l
.
└── data
    ├── closed  2023-07-10 10:13:31 6 bytes   hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
    └── closed  2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

As noted in the preceeding usage, I should use `dx cp` to *copy* data from one project to another. If I attempt to copy a file within a project, I will get an error:

```
$ dx cp hello.txt data/hello_copy.txt
dxpy.exceptions.DXCLIError: A source path and the destination path resolved
to the same project or container. Please specify different source and
destination containers, e.g.
dx cp source-project:source-id-or-path dest-project:dest-path
```

The only way to make an actual copy of a file is to upload it again as I did earlier when I added the *hello.txt* file a second time.

Data objects on the platform exist as bits in AWS or Azure storage, and the associated metadata is stored in a DNAnexus database. If two projects are in the same region such as AWS US-East-1, then `dx cp` doesn't actually copy the bits but rather creates a new database entry pointing to the object. This means you don't pay for additional storage. Copying between regions, however, does make a physical copy of the bits and will cost money for data egress and storage. When in doubt, use `dx describe <project-id>` to see a project's "Region" attribute or check the "Settings" in the project view UI.

## Finding Data

The `dx find` command will help you search for entities including:

* apps
* globalworkflows
* jobs
* data
* projects
* orgs
* org members
* org projects
* org apps

I can use the **`dx find data`** command to search data objects such as files and applets. I'll display the first part of the usage as it's rather long:

```
usage: dx find data [-h] [--brief | --verbose] [--json]
                    [--color {off,on,auto}] [--delimiter [DELIMITER]]
                    [--env-help] [--property KEY[=VALUE]] [--tag TAG]
                    [--class {record,file,applet,workflow,database}]
                    [--state {open,closing,closed,any}]
                    [--visibility {hidden,visible,either}] [--name NAME]
                    [--type TYPE] [--link LINK] [--all-projects]
                    [--path PROJECT:FOLDER] [--norecurse]
                    [--created-after CREATED_AFTER]
                    [--created-before CREATED_BEFORE] [--mod-after MOD_AFTER]
                    [--mod-before MOD_BEFORE] [--region REGION]

Finds data objects subject to the given search parameters. By default,
restricts the search to the current project if set. To search over all
projects (excluding public projects), use --all-projects (overrides --path and
--norecurse).
```

Run the command in the current project to see the two files:

```
$ dx find data
closed  2023-07-10 10:13:31 6 bytes   /data/hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

I can use the `--name` option to look for a file by name:

```
$ dx find data --name hs38DH.dict
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

I can also specify a Unix file glob pattern, such as all files that begin with *h*:

```
$ dx find data --name "h*"
closed  2023-07-10 10:13:31 6 bytes   /data/hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

Or all files that end with *.dict*. Note in this example that the asterisk is escapted with a backslash to prevent my shell from exanding it locally as I want the literal star to be given as the argument:

```
$ dx find data --name \*.dict
closed  2023-07-07 16:11:56 334.68 KB /data/hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
```

The `--brief` flag will return only the file ID:

```
$ dx find data --name \*.dict --brief
project-GXZ90x00fF6F4fy1K20x4gv9:file-GFz5xf00Bqx2j79G4q4F5jXV
```

This is useful, for instance, for downloading a file:

```
$ dx download $(dx find data --name \*.dict --brief)
[=======================>] Completed 342,714 of 342,714 bytes (100%)
                           /Users/kyclark@dnanexus.com/work/academy/hs38DH.dict
```

The `--json` flag will return the results in JSON format. In the JSON chapter, you will learn how to parse these results for more advanced querying and data manipulation:

```
$ dx find data --name \*.dict --json
[
    {
        "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
        "id": "file-GFz5xf00Bqx2j79G4q4F5jXV",
        "describe": {
            "id": "file-GFz5xf00Bqx2j79G4q4F5jXV",
            "project": "project-GXZ90x00fF6F4fy1K20x4gv9",
            "class": "file",
            "name": "hs38DH.dict",
            "state": "closed",
            "folder": "/data",
            "modified": 1688771516882,
            "size": 342714
        }
    }
]
```

The `--class` option accepts the following values:

* `applet`
* `database`
* `file`
* `record`
* `workflow`

The `--state` options accepts the following values:

* `open`: A file that is currently being uploaded
* `closing`: A file that is done uploading but is still being validated
* `closed`: A file that is uploaded and validated
* `any`: any of the above

There are many more options for finding data and other entities on the platform that will be covered in later chapters.

## Running Jobs

It's time to run an app, but which one? I'd like to have a FASTQ file to work with, so I'll start by using the SRA FASTQ Importer. I can never quite remember the name of the app, so I'll search for it using a wildcard:

```
$ dx find apps --name "sra*"
x SRA FASTQ Importer (sra_fastq_importer), v4.0.0
```

The "x" in the first column indicates this is an app supported by DNAnexus.

I can find information about the inputs and outputs to the app using either of these commands:

* `dx describe sra_fastq_importer`
* `dx run sra_fastq_importer -h`

I prefer the output from the second command:

```
$ dx run sra_fastq_importer -h
usage: dx run sra_fastq_importer [-iINPUT_NAME=VALUE ...]

App: SRA FASTQ Importer

Version: 4.0.0 (published)

Download SE or PE reads in FASTQ or FASTA format from SRA using SRR accessions

See the app page for more information:
  https://platform.dnanexus.com/app/sra_fastq_importer

Inputs:
  dbGaP Repository key: [-ingc_key=(file)]
        (Optional) Security token required for configuring NCBI SRA toolkit and decryption tools.

  SRR Accession: -iaccession=(string)
        Single SRR accession to fetch.
```

Looking at the usage for the app, I see that only the `-iaccession` argument is required as all the others are shown enclosed with square brackets, e.g., `[-ingc_key=(file)]`. I can run the app the SRA accession [SRR070372](https://www.ncbi.nlm.nih.gov/sra/?term=SRR070372) (*C. elegans*), answering "yes" to both launching and watching the app:

```
$ dx run sra_fastq_importer -iaccession=SRR070372

Using input JSON:
{
    "accession": "SRR070372"
}

Confirm running the executable with this input [Y/n]: y
Calling app-G49BFZ093qKvjFYgF8fyv6Z7 with output destination project-GXY0PK0071xJpG156BFyXpJF:/

Job ID: job-GXf8Qg8071xBJJg417YVYJX3
Watch launched job now? [Y/n] y
```

The equal sign in `-iaccession=SRR070372` is required.

The output of watching is the same as you would see from the UI if you click the "MONITOR" tab in the project view and then "View Log" while the app is running. The end of the watch shows the app ran successfully and that a new file was created in my project:

```
* SRA FASTQ Importer (sra_fastq_importer:main) (done)
  job-GXf8Qg8071xBJJg417YVYJX3
  kyclark 2023-07-10 15:38:21 (runtime 0:02:36)
  Output: single_reads_fastq = [ file-GXf8VgQ09bzK5q1XV5z1gx7j ]
```

I can find the size of the file with `dx ls`:

```
$ dx ls -l file-GXf8VgQ09bzK5q1XV5z1gx7j
closed  2023-07-10 15:41:38 206.59 MB SRR070372.fastq.gz (file-GXf8VgQ09bzK5q1XV5z1gx7j)
```

Now I'd like to run this into FastQC. I'll search for the app by name just to be sure, and, yes, it's called "fastqc":

```
$ dx find apps --name fastqc
x FastQC Reads Quality Control (fastqc), v3.0.3
```

Again, I use either `dx describe` or `dx run` to see that the app requires

```
usage: dx run fastqc [-iINPUT_NAME=VALUE ...]

App: FastQC Reads Quality Control

Version: 3.0.3 (published)

Generates a QC report on reads data

See the app page for more information:
  https://platform.dnanexus.com/app/fastqc

Inputs:
  Reads: -ireads=(file)
        A file containing the reads to be checked. Accepted formats are
        gzipped-FASTQ and BAM.
```

I will use the new file's ID as the input to FastQC, and I'll run it using the additional flags `-y` to confirm launching and `--watch` to immediately start watching the job:

```
$ dx run fastqc -ireads=file-GXf8P880FjgZGJQqx8Bf30YK -y --watch

Using input JSON:
{
    "reads": {
        "$dnanexus_link": "file-GXf8P880FjgZGJQqx8Bf30YK"
    }
}

Calling app-G81jg5j9jP7qxb310vg2xQkX with output destination project-GXY0PK0071xJpG156BFyXpJF:/

Job ID: job-GXf8fJQ071x00P5bQzQ62gjY
```

Notice that the confirmation shows "Using input JSON". If you like, you can save that to a file called, for example, *input.json*:

```
$ cat input.json
{
    "reads": {
        "$dnanexus_link": "file-GXf8P880FjgZGJQqx8Bf30YK"
    }
}
```

I can then launch the job using the `-f|--input-json-file` argument along with the `--brief` flag to show only the resulting job ID:

```
$ dx run fastqc -f input.json -y --brief
job-GXf930j071xJfYqfJ2kkvk8v
```

Since the output will be the same, I can kill the job using **`dx terminate job-GXf930j071xJfYqfJ2kkvk8v`**.

The end of the watch shows that the job finishes successfully:

```
* FastQC Reads Quality Control (fastqc:main) (done) job-GXf8fgj071x3KV4qyyKGZQVY
  kyclark 2023-07-10 15:51:11 (runtime 0:02:01)
  Output: report_html = file-GXf8gbQ06GxZ38zFXB46XYYj
          stats_txt = file-GXf8gbj06Gxy9F8P66pJG7J3
```

I would like to get a feel for the output, so I'll use `dx head` on the *stats\_txt* output file ID:

```
$ dx head file-GXf8gbj06Gxy9F8P66pJG7J3
##FastQC    0.11.9
>>Basic Statistics    pass
#Measure    Value
Filename    SRR070372.fastq.gz
File type   Conventional base calls
Encoding    Sanger / Illumina 1.9
Total Sequences 498843
Sequences flagged as poor quality   0
Sequence length 48-2044
%GC 39
```

## Review

You are now able to:

* List the advantages to interacting with platform via command line interface
* List the functions of the SDK and the API
* Describe the purpose of the dx-toolkit
* Apply frequently used dx-toolkit commands to execute common use cases, applicable to a broad audience of users

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.dnanexus.com/command_line_interface_cli/introduction_to_cli.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
