Example 1: Word Count (wc)

To get started, you will build a native bash applet that will execute the venerable wc (word count) Unix command-line program on a file. In this example, you will:

Use the dx-app-wizard to create the skeleton of a native bash applet
Define the inputs and outputs of an applet
Use dx build to build the applet
Import data from a URL
Use dx run to run the applet

Understanding wc

The wc command takes one or more files as input. So that we have the same input file, please execute the following command to fetch the URL from Project Gutenberg and write the contents to the local file scarlet.txt:

$ wget -O scarlet.txt https://www.gutenberg.org/cache/epub/33/pg33.txt

Or use curl:

$ curl -o scarlet.txt https://www.gutenberg.org/cache/epub/33/pg33.txt

By default, wc will print the three columns showing the number of lines, words, and characters of text, in that order, followed by the name of the file:

$ wc scarlet.txt
    8590   86055  513523 scarlet.txt

The output from your version of wc may differ slightly as there are several implementations of the program. For instance, the preceding output is on macOS, which is the BDS version, but the applet will run on Ubuntu Linux using the GNU version. Both programs work essentially the same.

The goal of this applet will be to accept a single file as input and capture the standard out (aka STDOUT) of wc to report the number of lines, words, and characters in the file.

Using dx-app-wizard

Next, you will create an applet that will accept this file as input, transfer it to a virtual machine, run wc on the file, and return the preceding output as a new file. Run the dx-app-wizard to interactively answer questions about the inputs, outputs, and runtime requirements. Start by executing the program with the -h|--help flag to read the documentation:

$ dx-app-wizard -h
usage: dx-app-wizard [-h] [--json-file JSON_FILE] [--language LANGUAGE]
                     [--template {basic,parallelized,scatter-process-gather}]
                     [name]

Create a source code directory for a DNAnexus app. You will be prompted for
various metadata for the app as well as for its input and output
specifications.

positional arguments:
  name                  Name of your app

optional arguments:
  -h, --help            show this help message and exit
  --json-file JSON_FILE
                        Use the metadata and IO spec found in the given file
  --language LANGUAGE   Programming language of your app
  --template {basic,parallelized,scatter-process-gather}
                        Execution pattern of your app

As shown in the preceding usage, the name of the applet may be provided as an argument. For instance, you can run dx-app-wizard wc to answer the first question, which is the name of the applet. Note the naming conventions for the applet name, which you should also follow for naming the input and output variables:

$ dx-app-wizard wc
DNAnexus App Wizard, API v1.0.0

Basic Metadata

Please enter basic metadata fields that will be used to describe your app.
Optional fields are denoted by options with square brackets.  At the end of
this wizard, the files necessary for building your app will be generated from
the answers you provide.

The name of your app must be unique on the DNAnexus platform.  After
creating your app for the first time, you will be able to publish new versions
using the same app name.  App names are restricted to alphanumeric characters
(a-z, A-Z, 0-9), and the characters ".", "_", and "-".
App Name [wc]:

Because the name was provided as an argument, the prompt shows [wc]. All the prompts will show a default value that will be used if you press the Enter key. If you wish to override this value, type a new name; otherwise, press Enter.

Next, you will be prompted for a title. The empty brackets ([]) indicate this is optional, but I will provide "Word Count":

The title, if provided, is what is shown as the name of your app on
the website.  It can be any valid UTF-8 string.
Title []: Word Count

Likewise, the summary is optional, but I will provide one:

The summary of your app is a short phrase or one-line description of
what your app does.  It can be any UTF-8 human-readable string.
Summary []: Find the number of lines, words, and characters in a file

Indicate the version with major, minor, and patch release:

You can publish multiple versions of your app, and the version of your
app is a string with which to tag a particular version.  We encourage the use
of Semantic Versioning for labeling your apps (see http://semver.org/ for more
details).
Version [0.0.1]: 0.1.0

The input specification follows. Use the name input_file for the first input name and whatever label you like. For the class, choose file to indicate that the user must supply a valid file, and specify that this input is not optional:

Input Specification

You will now be prompted for each input parameter to your app.  Each parameter
should have a unique name that uses only the underscore "_" and alphanumeric
characters, and does not start with a number.

1st input name (<ENTER> to finish): input_file
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

As this is the only input, press Enter when prompted for a second input and move to the output specification. To start, call the output outfile and use the class of file:

Output Specification

You will now be prompted for each output parameter of your app.  Each
parameter should have a unique name that uses only the underscore "_" and
alphanumeric characters, and does not start with a number.

1st output name (<ENTER> to finish): output
Label (optional human-readable name) []: Output file
Choose a class (<TAB> twice for choices): file

There is no other output for now, so press Enter to move on to the Timeout Policy. You may choose any amount of time you like such as "1h" to indicate 1 hour:

Timeout Policy

Set a timeout policy for your app. Any single entry point of the app
that runs longer than the specified timeout will fail with a TimeoutExceeded
error. Enter an int greater than 0 with a single-letter suffix (m=minutes,
h=hours, d=days) (e.g. "48h").
Timeout policy [48h]: 1h

Next, you will choose whether to use bash or Python as the primary language of the applet. Choose bash:

Template Options

You can write your app in any programming language, but we provide
templates for the following supported languages: Python, bash
Programming language: bash

Choosing bash means that your app will execute a bash script that will use commands from the dxpy module to do things like download and upload files as well as execute any command on the runtime instance, such as custom programs you write in Python, R, C, etc. Choosing Python here means that a Python script will be executed, and it can use the same Python module to do everything the bash script does. This tutorial will only demonstrate bash apps. There is no advantage one language has over the other. You should choose whichever suits your tastes.

During runtime, some apps may need to fetch resources from the internet or from the parent project. Neither of these will apply to this applet, so answer "no" for the next two questions:

Access to the Internet (other than accessing the DNAnexus API).
Will this app need access to the Internet? [y/N]: n

Direct access to the parent project. This is not needed if your app
specifies outputs,     which will be copied into the project after it's done
running.
Will this app need access to the parent project? [y/N]: n

Lastly, you will choose a default instance type on which the applet will run. I usually start with the default value, which is a fairly modest machine. If an applet proves it needs more resources, refer to the list of instance types to choose something else:

Default instance type: The instance type you select here will apply to
all entry points in your app unless you override it. See https://documenta
tion.dnanexus.com/developer/api/running-analyses/instance-types for more
information.
Choose an instance type for your app [mem1_ssd1_v2_x4]:

The wizard will finish with a listing of the files it has created:

*** Generating DNAnexus App Template... ***

Your app specification has been written to the dxapp.json file. You can
specify more app options by editing this file directly (see
https://documentation.dnanexus.com/developer for complete documentation).

Created files:
     wc/Readme.developer.md
     wc/Readme.md
     wc/dxapp.json
     wc/resources/
     wc/src/
     wc/src/wc.sh
     wc/test/

App directory created!  See https://documentation.dnanexus.com/developer for
tutorials on how to modify these files, or run "dx build wc" or "dx build
--create-app wc" while logged in with dx.

Running the DNAnexus build utility will create an executable on the DNAnexus
platform.  Any files found in the resources directory will be uploaded
so that they will be present in the root directory when the executable is run.

As noted, you will find the following structure in the directory wc:

$ find wc
wc 
wc/test # 1
wc/resources #2
wc/dxapp.json # 3
wc/Readme.md # 4
wc/Readme.developer.md # 5
wc/src # 6
wc/src/wc.sh # 7

A directory for tests, mostly used internally by DNAnexus.
A directory for assets like files or binaries you would like copied to the rutime instance.
A JSON file describing the metadata for the applet.
A documentation stub you may wish to update.
Another documentation stub.
A directory to place source code for the applet.
The bash script template to execute the applet.

Inspecting dxapp.json

In the preceding step, the applet's inputs, outputs, and system requirements were written to the file dxapp.json, which is in JSON (JavaScript Object Notation) format. Open this file to inspect the contents, which begins with the basic metadata about the app:

{
  "name": "wc",
  "title": "Word Count",
  "summary": "Find the number of lines, words, and characters in a file",
  "dxapi": "1.0.0",
  "version": "0.1.0",

The inputSpec section shows that this applet takes a single argument of the type file. Update the patterns to include .txt:

  "inputSpec": [
    {
      "name": "input_file",
      "label": "Input file",
      "class": "file",
      "optional": false,
      "patterns": [
        **"*.txt"**
      ],
      "help": ""
    }
  ],

The outputSpec shows that the applet will return a file:

  "outputSpec": [
    {
      "name": "output",
      "label": "Output",
      "class": "file",
      "patterns": [
        "*"
      ],
      "help": ""
    }
  ],

The runSpec describes the runtime for the applet:

  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "hours": 1
      }
    },
    "interpreter": "bash",
    "file": "src/wc.sh",
    "distribution": "Ubuntu",
    "release": "20.04", 
    "version": "0" 
  },

The default VM is Ubuntu 20.04, which includes Python v3 and R v3. You may also indicate Ubuntu 16.04, which has Python v2.
If you need Ubuntu 16.04 with Python v3, indicate version 1 here; otherwise, leave this 0.

The author has more success installing Python v2 on Ubuntu 20.04 rather than running an older Linux distro.

Finally, the regionalOptions describe the system requirements:

  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {
          "instanceType": "mem1_ssd1_v2_x4"
        }
      }
    }
  }
}

You may use a text editor to alter this file at any time, after which you will need to rebuild the applet.

Editing the Runtime Code

As indicated in runSpec, the applet will execute the bash script src/wc.sh at runtime. The app wizard created a template that shows one method for download the input file and uploading the output file. Here is a modified version that removes most of the comments for the sake of brevity and adds the applet's business logic in the middle:

#!/bin/bash

set -exo pipefail 

main() {
    echo "Value of input_file: '$input_file'"

    dx download "$input_file" -o input_file 

    wc input_file > output.txt 

    output_id=$(dx upload output.txt --brief) 

    dx-jobutil-add-output output "$output_id" --class=file 
}

I've added this pragma to show each command as it's executed and to halt on undefined variables or failed system calls.
This will download the input file to a local file called input_file on the running instance.
Execute wc on input_file and redirect standard out to the file output.
This will upload the result file called output from the instance back to the project.
This command will link the output file as an output of the applet.

The local variables $input_file and $output match the names used in the inputSpec and outputSpec. They will only exist at runtime.

Creating a Project for the Applet and Data

Applets and data must live inside a project, so create a new one either using the web interface or the command line by executing dx new project:

$ dx new project wc
Created new project called "wc" (project-GGyG8K80K9ZKzkX812yY893V)
Switch to new project now? [y/N]: y

Next, you will add the scarlet.txt file to the project. There are several ways you can do this. From the web interface, you can click the "Add" button that will show you options two relevant options:

"Upload Data": This will allow you to upload a file your local computer. You can drag and drop the file into the dialog box or use the file browser to select the file.
"Add Data From Server": This will launch an app that can import files accessible by a URL such as from a web address or FTP server. You should use the Project Gutenberg URL from earlier.

You can also use the dx upload command. If you created the project using the web interface, you will first need to run dx select to select your project:

$ dx select project-GGyG8K80K9ZKzkX812yY893V
Selected project project-GGyG8K80K9ZKzkX812yY893V
$ dx upload scarlet.txt
[===========================================================>]
Uploaded 513,523 of 513,523 bytes (100%) scarlet.txt
ID                    file-GGyG8z00K9Z9GQ9jG4qB4gpX
Class                 file
Project               project-GGyG8K80K9ZKzkX812yY893V
Folder                /
Name                  scarlet.txt
State                 closing
Visibility            visible
Types                 -
Properties            -
Tags                  -
Outgoing links        -
Created               Tue Oct  4 16:40:44 2022
Created by            kyclark
Last modified         Tue Oct  4 16:40:47 2022
Media type
archivalState         "live"
cloudAccount          "cloudaccount-dnanexus"

Note the file's ID, which we will use later for the applet's input. If you use the web interface to upload, you can click the information "I" in the circle to see the file's metadata.

From the command line, you can use dx ls with the -l|--long option to see the file ID:

$ dx ls -l
Project: wc (project-GGyG8K80K9ZKzkX812yY893V)
Folder : /
State   Last modified       Size      Name (ID)
closed  2022-10-04 16:40:48 501.49 KB scarlet.txt (file-GGyG8z00K9Z9GQ9jG4qB4gpX)

Building and Running The Applet

It's impossible to debug this program locally, so next you will build the applet and run it. If you are in the wc directory, run dx build to build the applet; if you are in the directory above, run dx build wc to indicate the directory that contains the applet. Subsequent builds will require the use of of the -f|--overwrite or -a|--archive flag to indicate what to do with the previous version. For consistency's sake, I always run with the -f flag:

$ dx build -f
{"id": "applet-GGyGVP00K9Z4Z6VgBgkk0b06"}

From the web interface, you can now view a web form that will allow you to execute the applet.

You do the same process that is listed in the Overview of that Platform section.

Running the Applet from the Command Line

You can also run the applet from the command line using the applet's ID. To begin, use dx run with the -h|--help flag to see the inputs and outputs of the applet:

$ dx run applet-GGyGVP00K9Z4Z6VgBgkk0b06 -h
usage: dx run applet-GGyGVP00K9Z4Z6VgBgkk0b06 [-iINPUT_NAME=VALUE ...]

Applet: Word Count

Find the number of lines, words, and characters in a file

Inputs:
  Input file: -iinput_file=(file)

Outputs:
  Output: output (file)

Run the same command without the help flag to enter an interactive session where you can indicate the input file using the file's ID noted earlier:

$ dx run applet-GGyGVP00K9Z4Z6VgBgkk0b06
Entering interactive mode for input selection.

Input:   Input file (input_file)
Class:   file

Enter file ID or path (<TAB> twice for compatible files in current directory,
'?' for more options)
input_file: file-GGyG8z00K9Z9GQ9jG4qB4gpX

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GGyG8z00K9Z9GQ9jG4qB4gpX"
    }
}

Confirm running the executable with this input [Y/n]: n

You may also use specify the file on the command line:

$ dx run applet-GGyGVP00K9Z4Z6VgBgkk0b06 -iinput_file=file-GGyG8z00K9Z9GQ9jG4qB4gpX

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GGyG8z00K9Z9GQ9jG4qB4gpX"
    }
}

Confirm running the executable with this input [Y/n]: n

Notice in both instances, the input is formatted as a JSON document for submission. Copy that JSON into a file with the following contents:

$ cat inputs.json
{
    "input_file": {
        "$dnanexus_link": "file-GGyG8z00K9Z9GQ9jG4qB4gpX"
    }
}

Use this file as the -f|--file input for the applet along with the -y flag to indicate you want to proceed without further confirmation and the --watch flag to enter into a watch of the applet's progress:

$ dx run applet-GGyGVP00K9Z4Z6VgBgkk0b06 -f inputs.json -y --watch

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GGyG8z00K9Z9GQ9jG4qB4gpX"
    }
}

Calling applet-GGyGVP00K9Z4Z6VgBgkk0b06 with output destination
  project-GGyG8K80K9ZKzkX812yY893V:/

Job ID: job-GGyGZPQ0K9Z7PXybBp52P3xF

Job Log
-------
Watching job job-GGyGZPQ0K9Z7PXybBp52P3xF. Press Ctrl+C to stop watching.

The end of the job's output should look like the following:

2022-10-04 17:08:36 Word Count STDERR + wc input_file
2022-10-04 17:08:36 Word Count STDERR ++ dx upload output --brief
2022-10-04 17:08:37 Word Count STDERR + output=file-GGyGf100qZbvFjb3GqfG6kzj
2022-10-04 17:08:37 Word Count STDERR + dx-jobutil-add-output output
file-GGyGf100qZbvFjb3GqfG6kzj --class=file

Run dx describe on the indicated output file ID to see the metadata about the file. Then execute dx cat to see the contents of the file, which should be the same results as when the program ran locally:

$ dx cat file-GGyGf100qZbvFjb3GqfG6kzj
  8590  86055 513523 input_file

Review

In this chapter, you did the following:

Learned the structure of a native bash and how to use the wizard to create a new app
Built an app and ran it from the command line and the web interface
Inspected the output of an applet

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

PreviousBash NextExample 2: fastq_quality_trimmer

Last updated 6 months ago

Was this helpful?