Users of the platform like to interact with it in a variety of ways (shown below), but this section is dedicated to those that want to learn how to interact with it using the command line, or CLI.
Terms
The CLI interacts with the platform in the following way:
The CLI (command line interface) is run locally on your own machine.
On your local machine, you will download the SDK (software development kit), which we also call dx-toolkit. Information on downloading it and other requirements is found in the Getting Started Guide. Once set up, this allows you to log into the platform and explore your data/ projects, create apps and workflows, and launch analyses.
Installation
Please ensure that you are running Python 3 before starting this install.
To install:
pip3 install dxpy
To upgrade dxpy
pip3 install –upgrade dxpy
Introducing dx-toolkit
The dx command will be your most used utility for interacting with the DNAnexus platform. You can run the command with no arguments or with the -h or --help flags to see the usage:
usage: dx [-h] [--version] command ...
DNAnexus Command-Line Client, API v1.0.0, client v0.346.0
dx is a command-line client for interacting with the DNAnexus platform. You
can log in, navigate, upload, organize and share your data, launch analyses,
and more. For a quick tour of what the tool can do, see
https://documentation.dnanexus.com/getting-started/tutorials/cli-quickstart#q>
For a breakdown of dx commands by category, run "dx help".
dx exits with exit code 3 if invalid input is provided or an invalid operation
is requested, and exit code 1 if an internal error is encountered. The latter
usually indicate bugs in dx; please report them at
https://github.com/dnanexus/dx-toolkit/issues
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment
variables
--version show program's version number and exit
Sometime the usage make occupy your entire terminal, in which case you may see (END) to show that you are at the end of the documentation. Press q to quit the usage, or use the universal Ctrl-C to send an interrupt signal to the process to kill it.
Run dx help to read about the categories of commands you can run:
$ dx help
usage: dx help [-h] [command_or_category] [subcommand]
Displays the help message for the given command (and subcommand if given), or
displays the list of all commands in the given category.
CATEGORIES
all All commands
session Manage your login session
fs Navigate and organize your projects and files
data View, download, and upload data
metadata View and modify metadata for projects, data, and executions
workflow View and modify workflows
exec Manage and run apps, applets, and workflows
org Administer and operate on orgs
other Miscellaneous advanced utilities
Logging Into the Platform
Let's start by using dx login to gain access to the DNAnexus platform from the command line. All dx commands will respond to -h|--help, so run the command with one of these flags to read the usage:
$ dx login -h
usage: dx login [-h] [--env-help] [--token TOKEN] [--noprojects] [--save]
[--timeout TIMEOUT]
Log in interactively and acquire credentials. Use "--token" to log in with an
existing API token.
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment variables
--token TOKEN Authentication token to use
--noprojects Do not print available projects
--save Save token and other environment variables for future
sessions
--timeout TIMEOUT Timeout for this login token (in seconds, or use suffix
s, m, h, d, w, M, y)
The help documentation is often called the usage because that is often the first word of the output. In the previous output, notice that the all the arguments are enclosed in square brackets, e.g., [--token TOKEN]. This is a common convention in Unix documentation to indicate that the argument is optional. The lack of such square brackets means the argument is required.
Some of the arguments require a value to follow. For example, --token TOKEN means the argument --token must be followed by the string value for the token. Arguments like --save are known as flags. They are either present or not and often represent a Boolean value, usually "True" when present and "False" when absent.
The most basic usage for login is to enter your username and password when prompted:
TODO: Reasons for using tokens, security, dangers. You may also generate a token in the web UI for use on the command line:
$ dx login --token xxxxxxxxxxx
Use dx logout to log out of the platform. This invalidates a token.
If you are ever in doubt of your username, use dx whoami to see your identity.
When you ssh into a cloud workstation, you will be your normal DNAnexus user.
When running the ttyd app to access a cloud workstation through the UI, you will be the privileged Unix user root.
When you ssh into a running job, you will be the user dnanexus.
Working with Projects and Users
A project is the smallest unit of sharing in DNAnexus, and you must always work in the context of a project. Upon login, you will be prompted to select a project. To change projects, use dx select. Use -h|--help to view the usage:
$ dx select -h
usage: dx select [-h] [--env-help] [--name NAME]
[--level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}] [--public]
[project]
Interactively list and select a project to switch to. By default, only lists
projects for which you have at least CONTRIBUTE permissions. Use --public to
see the list of public projects.
positional arguments:
project Name or ID of a project to switch to; if not provided
a list will be provided for you
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment
variables
--name NAME Name of the project (wildcard patterns supported)
--level {VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
Minimum level of permissions expected
--public Include ONLY public projects (will automatically set
--level to VIEW)
When run with no options, you will be presented a list of your projects and privilege:
$ dx select
Note: Use dx select --level VIEW or dx select --public to
select from projects for which you only have VIEW permissions.
Available projects (CONTRIBUTE or higher):
0) App Dev (ADMINISTER)
1) Methylation (ADMINISTER)
2) Genomes (ADMINISTER)
3) WTS (ADMINISTER)
4) WGS (ADMINISTER)
5) Exome (ADMINISTER)
6) QC (ADMINISTER)
7) Collaborators (ADMINISTER)
8) Pipeline Dev (ADMINISTER)
9) WDL Test (ADMINISTER)
m) More options not shown...
Pick a numbered choice or "m" for more options [0]:
Press Enter to choose the first project, or select a number 0-9 to choose a project or m for "more" options. You can also provide a project name or ID as the first argument:
Use the --level option to specify only projects where you have a particular permission. For instance, dx select --level ADMINISTER will show only projects where you are an administrator.
Normally, projects are private to your organization, but the --public option will display the public projects that DNAnexus uses to share common resources like sequence files or indexes for reference genomes:
$ dx select --public
Available public projects:
0) Reference Genome Files: Azure US (West) (VIEW)
1) App_Assets_Europe(London)_Internal (VIEW)
2) Reference Genome Files: Azure Amsterdam (VIEW)
3) Reference Genome Files: AWS Germany (VIEW)
4) Reference Genome Files: AWS US (East) (VIEW)
5) Reference Genome Files: AWS Europe (London) (VIEW)
6) App and Applet Assets Azure (VIEW)
7) dxCompiler_Europe_London (VIEW)
8) dxCompiler_Sydney (VIEW)
9) dxCompiler_Berlin (VIEW)
m) More options not shown...
Pick a numbered choice or "m" for more options:
Press Ctrl-C to exit the program without making a selection.
If you are ever in doubt as to your current project, run dx pwd (print working directory):
$ dx pwd
Pipeline Dev:/
Alternatively, you can run dx env to see your current environment:
$ dx env
Auth token used XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
API server protocol https
API server host api.dnanexus.com
API server port 443
Current workspace project-XXXXXXXXXXXXXXXXXXXXXXXX
Current workspace name "Pipeline Dev"
Current folder /
Current user test_user
If I wanted to share some data with a collaborator, I would use dx new project to create a new project to hold select data and apps. Following is the usage:
$ dx new project -h
usage: dx new project [-h] [--brief | --verbose] [--env-help]
[--region REGION] [-s] [--bill-to BILL_TO] [--phi]
[--database-ui-view-only]
[name]
Create a new project
positional arguments:
name Name of the new project
options:
-h, --help show this help message and exit
--brief Display a brief version of the return value; for most
commands, prints a DNAnexus ID per line
--verbose If available, displays extra verbose output
--env-help Display help message for overriding environment
variables
--region REGION Region affinity of the new project
-s, --select Select the new project as current after creating
--bill-to BILL_TO ID of the user or org to which the project will be
billed. The default value is the billTo of the
requesting user.
--phi Add PHI protection to project
--database-ui-view-only
Viewers on the project cannot access database data
directly
$ dx new project --region aws:us-east-1 demo_project
Created new project called "demo_project" (project-GXZ90x00fF6F4fy1K20x4gv9)
Switch to new project now? [y/N]: y
Next, I would use dx invite <user-id> to invite users to the project. Start with the usage to see how to call the command:
$ dx invite -h
usage: dx invite [-h] [--env-help] [--no-email]
invitee [project] [{VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}]
Invite a DNAnexus entity to a project. If the invitee is not recognized as a
DNAnexus ID, it will be treated as a username, i.e. "dx invite alice : VIEW"
is equivalent to inviting the user with user ID "user-alice" to view your
current default project.
positional arguments:
invitee Entity to invite
project Project to invite the invitee to
{VIEW,UPLOAD,CONTRIBUTE,ADMINISTER}
Permissions level the new member should have
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment
variables
--no-email Disable email notifications to invitee
The usage to see that this command includes three positional arguments, the first of which (invitee) is required and the other two (project, permissions) are optional. Your currently selected project is the default project, and "VIEW" is the default permission. If you wish to indicate some permission other than "VIEW," you must specify the project first.
Use dx uninvite <user-id> to revoke a user's access to a project:
$ dx uninvite -h
usage: dx uninvite [-h] [--env-help] entity [project]
Revoke others' permissions on a project you administer. If the entity is not
recognized as a DNAnexus ID, it will be treated as a username, i.e. "dx
uninvite alice :" is equivalent to revoking the permissions of the user with
user ID "user-alice" to your current default project.
positional arguments:
entity Entity to uninvite
project Project to revoke permissions from
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment variables
Data Exploration
Earlier, I introduced dx pwd to print working directory to find my currently selected project.
$ dx pwd -h
usage: dx pwd [-h] [--env-help]
Print current working directory
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment variables
Notice that the output shows the project name and the directory /, which is the root directory of the project:
$ dx pwd
demo_project:/
The command dx ls will list the contents of a directory. Notice in the usage that the directory name is optional, in which case it will use the current working directory:
$ dx ls -h
usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]]
[--env-help] [--brief | --verbose] [-a] [-l] [--obj] [--folders]
[--full]
[path]
List folders and/or objects in a folder
positional arguments:
path Folder (possibly in another project) to list the
contents of, default is the current directory in the
current project. Syntax: projectID:/folder/path
There is nothing to list because I just created this project, so I'll add some data next.
Copying and Moving Files
I will use the command dx cp to copy a small file from one of the public projects into my project. I'll start with the usage:
usage: dx cp [-h] [--env-help] [-a] source [source ...] destination
Copy objects and/or folders between different projects. Folders will
automatically be copied recursively. To specify which project to use as a
source or destination, prepend the path or ID of the object/folder with the
project ID or name and a colon.
EXAMPLES
The first example copies a file in a project called "FirstProj" to the
current directory of the current project. The second example copies the
object named "reads.fq.gz" in the current directory to the folder
/folder/path in the project with ID "project-B0VK6F6gpqG6z7JGkbqQ000Q",
and finally renaming it to "newname.fq.gz".
$ dx cp FirstProj:file-B0XBQFygpqGK8ZPjbk0Q000q .
$ dx cp reads.fq.gz project-B0VK6F6gpqG6z7JGkbqQ000Q:/folder/path/newname.fq.>
positional arguments:
source Objects and/or folder names to copy
destination Folder into which to copy the sources or new pathname (if only
one source is provided). Must be in a different
project/container than all source paths.
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment
variables
-a, --all Apply to all results with the same name without
prompting
The usage shows source [source …], which is another Unix convention to indicate that the argument may be repeated. This means you can indicate several source files or directories to be copied to the final destination.
I'll copy the file hs38DH.dict from the project "Reference Genome Files: AWS US (East)" into the root directory of my new project. The command will only produce output on error:
I must specify the source file using the project and file ID. When you refer to files inside your current project, it's only necessary to use the file ID.
Now I can list the one file:
$ dx ls
hs38DH.dict
Often you'll want to use the file ID, which you can view using the -l|--long flag to see the long listing that includes more metadata:
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
State Last modified Size Name (ID)
closed 2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
I've decided I want to create a data directory to hold files such as this, so I will use dx mkdir data. The command will produce no output on success. A new listing shows data/ where the trailing slash indicates this is a directory:
$ dx ls
data/
hs38DH.dict
To move the hs38DH.dict into the data directory, I can either use the file name or ID:
dx mv file-GFz5xf00Bqx2j79G4q4F5jXV data
dx mv hs38DH.dict data
A new listing shows that the file is no longer in the root directory:
$ dx ls
data/
I can specify the data directory to view the contents:
$ dx ls data
hs38DH.dict
Alternatively, I can use dx cd data to change directories. The command dx pwd will verify that I'm in the new folder:
$ dx pwd
demo_project:/data
If I execute dx ls now, I'll see the contents of the data directory:
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /data
State Last modified Size Name (ID)
closed 2023-07-07 16:11:56 334.68 KB hs38DH.dict (file-GFz5xf00Bqx2j79G4q4F5jXV)
Return to the root directory of the project by runing dx cd or dx cd /.
Another way to inspect the structure of a project is using dx tree:
$ dx tree -h
usage: dx tree [-h] [--color {off,on,auto}] [--env-help] [-a] [-l] [path]
List folders and objects in a tree
positional arguments:
path Folder (possibly in another project) to list the
contents of, default is the current directory in the
current project. Syntax: projectID:/folder/path
options:
-h, --help show this help message and exit
--color {off,on,auto}
Set when color is used (color=auto is used when stdout
is a TTY)
--env-help Display help message for overriding environment
variables
-a, --all show hidden files
-l, --long use a long listing format
With no options, you will see a tree structure of the project:
$ dx tree
.
└─ data
└─ hs38DH.dict
This command will also show the long listing with -l|--long:
$ dx tree -l
.
└─ data
└─ closed 2023-07-07 16:11:56 334.68 KB hs38DH.dict
(file-GFz5xf00Bqx2j79G4q4F5jXV)
Uploading Data
I want to create a local file on my computer and add it to the project. I'll use the echo command to redirect some text into a file:
$ echo hello > hello.txt
I'll use the dx upload command. The usage shows that filename is required and may be repeated.
$ dx upload -h
usage: dx upload [-h] [--visibility {hidden,visible}] [--property KEY=VALUE]
[--type TYPE] [--tag TAG] [--details DETAILS] [-p]
[--brief | --verbose] [--env-help] [--path [PATH]] [-r]
[--wait] [--no-progress] [--buffer-size WRITE_BUFFER_SIZE]
[--singlethread]
filename [filename ...]
Upload local file(s) or directory. If "-" is provided, stdin will be used
instead. By default, the filename will be used as its new name. If
--path/--destination is provided with a path ending in a slash, the filename
will be used, and the folder path will be used as a destination. If it does not
end in a slash, then it will be used as the final name.
positional arguments:
filename Local file or directory to upload ("-" indicates stdin
input); provide multiple times to upload multiple files
or directories
There are many options to the command, and here are a few to highlight:
--brief: Display a brief version of the return value; for most commands, prints a DNAnexus ID per line
-r, --recursive: Upload directories recursively
--path [PATH], --destination [PATH]: DNAnexus path to upload file(s) to (default uses current project and folder if not provided)
Run dx upload hello.txt and see that the new file exists in the root directory of your current project:
$ dx ls
data/
hello.txt
You can also upload data using the UI. Under the "Add" menu, you will find the following:
Upload Data: Use your browser to add files to the project. This is the same as using dx upload.
Copy Data From Project: Add data from existing projects on the platform. This is the same as dx cp.
I would like to check the new file on the platform. The dx cat command will, like the Unix catconcatenate command, print the entire contents of a file to the console:
$ dx cat -h
usage: dx cat [-h] [--env-help] [--unicode] path [path ...]
positional arguments:
path File ID or name(s) to print to stdout
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment variables
--unicode Display the characters as text/unicode when writing to stdout
I can use this to verify that the file was correctly uploaded:
$ dx cat hello.txt
hello
You might expect the following command to upload hello.txt into the data directory:
$ dx upload hello.txt --path data
Unfortunately, this will create a file called data alongside a directory called data:
$ dx ls
data/
data
hello.txt
I can verify that the data file contains "hello":
$ dx cat data
hello
Note this important part of upload's usage:
If --path/--destination is provided with a path ending in a slash, the
filename will be used, and the folder path will be used as a destination.
If it does not end in a slash, then it will be used as the final name.
This brings up an interesting point that file names are not unique on the DNAnexus platform. The only unique identifier is the file ID, and so this is always the best way to refer to a file. To rectify the duplication, I will get the file ID:
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State Last modified Size Name (ID)
closed 2023-07-07 16:34:31 6 bytes data (file-GXZB2180fF65j2G1197pP7By)
closed 2023-07-07 16:34:10 6 bytes hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
I can remove the file using dx rm file-GXZB2180fF65j2G1197pP7By.
If I dx upload hello.txt file again, I will not overwrite the existing file. Rather, another copy of the file will be created with a new file ID:
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State Last modified Size Name (ID)
closed 2023-07-07 17:01:20 6 bytes hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
closed 2023-07-07 16:34:10 6 bytes hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
I cannot remove the file by filename as it's not unique, so I'm prompted to select which file I want:
$ dx rm hello.txt
The given path "hello.txt" resolves to the following data objects:
0) closed 2023-07-07 17:01:20 6 bytes hello.txt (file-GXZBKYQ0fF6Pf2ZKPBF7G7j9)
1) closed 2023-07-07 16:34:10 6 bytes hello.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
Pick a numbered choice or "*" for all: 0
I used dx cat hello.txt to read the contents of the entire file because I knew the file had only one line. It's far safer to use dx head to look at just the first few lines (the default is 10):
$ dx head -h
usage: dx head [-h] [--color {off,on,auto}] [--env-help] [-n N] path
Print the first part of a file. By default, prints the first 10 lines.
positional arguments:
path File ID or name to access
options:
-h, --help show this help message and exit
--color {off,on,auto}
Set when color is used (color=auto is used when stdout
is a TTY)
--env-help Display help message for overriding environment
variables
-n N, --lines N Print the first N lines (default 10)
For instance, I can peek at the data/hs38DH.dict file:
Every data object on the platform has a unique identifier prefixed with the type of object such as "file-," "record-," or "applet-." Earlier, I saw that hello.txt has the ID file-GXZB1v80fF6BXJ8p7PvZPy1v. I can use the dx describe command to view the metadata:
$ dx describe -h
usage: dx describe [-h] [--json] [--color {off,on,auto}]
[--delimiter [DELIMITER]] [--env-help] [--details]
[--verbose] [--name] [--multi]
path
Describe a DNAnexus entity. Use this command to describe data objects by name
or ID, jobs, apps, users, organizations, etc. If using the "--json" flag, it
will thrown an error if more than one match is found (but if you would like a
JSON array of the describe hashes of all matches, then provide the "--multi"
flag). Otherwise, it will always display all results it finds.
NOTES:
- The project found in the path is used as a HINT when you are using an object ID;
you may still get a result if you have access to a copy of the object in some
other project, but if it exists in the specified project, its description will
be returned.
- When describing apps or applets, options marked as advanced inputs will be
hidden unless --verbose is provided
positional arguments:
path Object ID or path to an object (possibly in another
project) to describe.
options:
-h, --help show this help message and exit
--json Display return value in JSON
--color {off,on,auto}
Set when color is used (color=auto is used when stdout
is a TTY)
--delimiter [DELIMITER], --delim [DELIMITER]
Always use exactly one of DELIMITER to separate fields
to be printed; if no delimiter is provided with this
flag, TAB will be used
--env-help Display help message for overriding environment
variables
--details Include details of data objects
--verbose Include additional metadata
--name Only print the matching names, one per line
--multi If the flag --json is also provided, then returns a JSON
array of describe hashes of all matching results
I could use the filename, if it's unique, but it's always best practice to use the file ID:
$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v
Result 1:
ID file-GXZB1v80fF6BXJ8p7PvZPy1v
Class file
Project project-GXZ90x00fF6F4fy1K20x4gv9
Folder /
Name hello.txt
State closed
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created Fri Jul 7 16:34:09 2023
Created by kyclark
Last modified Fri Jul 7 16:34:10 2023
Media type text/plain
archivalState "live"
Size 6 bytes
cloudAccount "cloudaccount-dnanexus"
As shown in the usage, the --delim option causes the output table to use whatever delimiter you want between the columns. This could be useful if you wish to parse the output programmatically. The tab character is the default delimiter, but I can use a comma like so:
$ dx describe file-GXZB1v80fF6BXJ8p7PvZPy1v --delim ,
Result 1:
ID,file-GXZB1v80fF6BXJ8p7PvZPy1v
Class,file
Project,project-GXZ90x00fF6F4fy1K20x4gv9
Folder,/
Name,hello.txt
State,closed
Visibility,visible
Types,-
Properties,-
Tags,-
Outgoing links,-
Created,Fri Jul 7 16:34:09 2023
Created by,kyclark
Last modified,Fri Jul 7 16:34:10 2023
Media type,text/plain
archivalState,"live"
Size,6 bytes
cloudAccount,"cloudaccount-dnanexus"
The --json flag returns the same data in JavaScript Object Notation (JSON), which we'll discuss in a later chapter:
I can use dx describe to view the metadata associated with any object identifer on the platform. For instance, I'll use head to view the first few lines of the project's metadata:
$ dx describe project-GXZ90x00fF6F4fy1K20x4gv9 | head
Result 1:
ID project-GXZ90x00fF6F4fy1K20x4gv9
Class project
Name demo_project
Summary
Billed to org-sos
Access level ADMINISTER
Region aws:us-east-1
Protected false
Restricted false
Find another entity ID, such as your billing org, to use with the command.
Copying and Moving Files
I can use dx mv to move a file or directory within a project:
$ dx mv -h
usage: dx mv [-h] [--env-help] [-a] source [source ...] destination
Move or rename data objects and/or folders inside a single project. To copy
data between different projects, use 'dx cp' instead.
positional arguments:
source Objects and/or folder names to move
destination Folder into which to move the sources or new pathname (if only
one source is provided). Must be in the same project/container
as all source paths.
options:
-h, --help show this help message and exit
--env-help Display help message for overriding environment
variables
-a, --all Apply to all results with the same name without
prompting
For instance, I can rename hello.txt to goodbye.txt with the command dx mv hello.txt goodbye.txt. The file ID remains the same:
$ dx ls -l
Project: demo_project (project-GXZ90x00fF6F4fy1K20x4gv9)
Folder : /
data/
State Last modified Size Name (ID)
closed 2023-07-10 10:11:31 6 bytes goodbye.txt (file-GXZB1v80fF6BXJ8p7PvZPy1v)
I can also move goodbye.txt to the data directory and rename it back to hello.txt. Again, the file ID remains the same because I have only changed some of the file's metadata:
As noted in the preceeding usage, I should use dx cp to copy data from one project to another. If I attempt to copy a file within a project, I will get an error:
$ dx cp hello.txt data/hello_copy.txt
dxpy.exceptions.DXCLIError: A source path and the destination path resolved
to the same project or container. Please specify different source and
destination containers, e.g.
dx cp source-project:source-id-or-path dest-project:dest-path
The only way to make an actual copy of a file is to upload it again as I did earlier when I added the hello.txt file a second time.
Data objects on the platform exist as bits in AWS or Azure storage, and the associated metadata is stored in a DNAnexus database. If two projects are in the same region such as AWS US-East-1, then dx cp doesn't actually copy the bits but rather creates a new database entry pointing to the object. This means you don't pay for additional storage. Copying between regions, however, does make a physical copy of the bits and will cost money for data egress and storage. When in doubt, use dx describe <project-id> to see a project's "Region" attribute or check the "Settings" in the project view UI.
Finding Data
The dx find command will help you search for entities including:
apps
globalworkflows
jobs
data
projects
orgs
org members
org projects
org apps
I can use the dx find data command to search data objects such as files and applets. I'll display the first part of the usage as it's rather long:
usage: dx find data [-h] [--brief | --verbose] [--json]
[--color {off,on,auto}] [--delimiter [DELIMITER]]
[--env-help] [--property KEY[=VALUE]] [--tag TAG]
[--class {record,file,applet,workflow,database}]
[--state {open,closing,closed,any}]
[--visibility {hidden,visible,either}] [--name NAME]
[--type TYPE] [--link LINK] [--all-projects]
[--path PROJECT:FOLDER] [--norecurse]
[--created-after CREATED_AFTER]
[--created-before CREATED_BEFORE] [--mod-after MOD_AFTER]
[--mod-before MOD_BEFORE] [--region REGION]
Finds data objects subject to the given search parameters. By default,
restricts the search to the current project if set. To search over all
projects (excluding public projects), use --all-projects (overrides --path and
--norecurse).
Run the command in the current project to see the two files:
Or all files that end with .dict. Note in this example that the asterisk is escapted with a backslash to prevent my shell from exanding it locally as I want the literal star to be given as the argument:
$ dx find data --name \*.dict --brief
project-GXZ90x00fF6F4fy1K20x4gv9:file-GFz5xf00Bqx2j79G4q4F5jXV
This is useful, for instance, for downloading a file:
$ dx download $(dx find data --name \*.dict --brief)
[=======================>] Completed 342,714 of 342,714 bytes (100%)
/Users/kyclark@dnanexus.com/work/academy/hs38DH.dict
The --json flag will return the results in JSON format. In the JSON chapter, you will learn how to parse these results for more advanced querying and data manipulation:
closing: A file that is done uploading but is still being validated
closed: A file that is uploaded and validated
any: any of the above
There are many more options for finding data and other entities on the platform that will be covered in later chapters.
Running Jobs
It's time to run an app, but which one? I'd like to have a FASTQ file to work with, so I'll start by using the SRA FASTQ Importer. I can never quite remember the name of the app, so I'll search for it using a wildcard:
The "x" in the first column indicates this is an app supported by DNAnexus.
I can find information about the inputs and outputs to the app using either of these commands:
dx describe sra_fastq_importer
dx run sra_fastq_importer -h
I prefer the output from the second command:
$ dx run sra_fastq_importer -h
usage: dx run sra_fastq_importer [-iINPUT_NAME=VALUE ...]
App: SRA FASTQ Importer
Version: 4.0.0 (published)
Download SE or PE reads in FASTQ or FASTA format from SRA using SRR accessions
See the app page for more information:
https://platform.dnanexus.com/app/sra_fastq_importer
Inputs:
dbGaP Repository key: [-ingc_key=(file)]
(Optional) Security token required for configuring NCBI SRA toolkit and decryption tools.
SRR Accession: -iaccession=(string)
Single SRR accession to fetch.
$ dx run sra_fastq_importer -iaccession=SRR070372
Using input JSON:
{
"accession": "SRR070372"
}
Confirm running the executable with this input [Y/n]: y
Calling app-G49BFZ093qKvjFYgF8fyv6Z7 with output destination project-GXY0PK0071xJpG156BFyXpJF:/
Job ID: job-GXf8Qg8071xBJJg417YVYJX3
Watch launched job now? [Y/n] y
The equal sign in -iaccession=SRR070372 is required.
The output of watching is the same as you would see from the UI if you click the "MONITOR" tab in the project view and then "View Log" while the app is running. The end of the watch shows the app ran successfully and that a new file was created in my project:
Now I'd like to run this into FastQC. I'll search for the app by name just to be sure, and, yes, it's called "fastqc":
$ dx find apps --name fastqc
x FastQC Reads Quality Control (fastqc), v3.0.3
Again, I use either dx describe or dx run to see that the app requires
usage: dx run fastqc [-iINPUT_NAME=VALUE ...]
App: FastQC Reads Quality Control
Version: 3.0.3 (published)
Generates a QC report on reads data
See the app page for more information:
https://platform.dnanexus.com/app/fastqc
Inputs:
Reads: -ireads=(file)
A file containing the reads to be checked. Accepted formats are
gzipped-FASTQ and BAM.
I will use the new file's ID as the input to FastQC, and I'll run it using the additional flags -y to confirm launching and --watch to immediately start watching the job:
$ dx run fastqc -ireads=file-GXf8P880FjgZGJQqx8Bf30YK -y --watch
Using input JSON:
{
"reads": {
"$dnanexus_link": "file-GXf8P880FjgZGJQqx8Bf30YK"
}
}
Calling app-G81jg5j9jP7qxb310vg2xQkX with output destination project-GXY0PK0071xJpG156BFyXpJF:/
Job ID: job-GXf8fJQ071x00P5bQzQ62gjY
Notice that the confirmation shows "Using input JSON". If you like, you can save that to a file called, for example, input.json:
I would like to get a feel for the output, so I'll use dx head on the stats_txt output file ID:
$ dx head file-GXf8gbj06Gxy9F8P66pJG7J3
##FastQC 0.11.9
>>Basic Statistics pass
#Measure Value
Filename SRR070372.fastq.gz
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 498843
Sequences flagged as poor quality 0
Sequence length 48-2044
%GC 39
Review
You are now able to:
List the advantages to interacting with platform via command line interface
List the functions of the SDK and the API
Describe the purpose of the dx-toolkit
Apply frequently used dx-toolkit commands to execute common use cases, applicable to a broad audience of users
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
API (application programming interface) Servers are used for us to interact with the Platform using HTTP requests. The arguments for this request are fields in a JSON file. If you want more details on this structure, you can go to .
Further details can be found in our if you need it.
Information on setting up tokens can be found in the section of our Documentation.
I will use this command to create a new project in the AWS US-East-1 region. See the documentation for a list of . The command displays the new project ID and prompts to switch into the new project:
Add Data From Server: Add data from any publicly accessible URL such as an HTTP or FTP site. This is the same as running the app.
Import From AWS S3: Add data from an S3 bucket. This is the same as running the app.
In addition, we offer an app.
The concept of immutability was covered in "Course 101 Overview of the DNA nexus Platfrom USer Interface": Remember the crucially important fact that data objects on the DNAnexus platform are immutable. They can only be created (e.g., by uploading them) or removed, but they can never be overwritten. A given object ID always points to the same collection of bits, which leads to downstream benefits like reusing the outputs of jobs that share the same executable and input IDs ().
Looking at the usage for the app, I see that only the -iaccession argument is required as all the others are shown enclosed with square brackets, e.g., [-ingc_key=(file)]. I can run the app the SRA accession (C. elegans), answering "yes" to both launching and watching the app: