Example 1: Word Count (wc)
To get started, you will build a native bash
applet that will execute the venerable wc
(word count) Unix command-line program on a file. In this example, you will:
Use the
dx-app-wizard
to create the skeleton of a native bash appletDefine the inputs and outputs of an applet
Use
dx build
to build the appletImport data from a URL
Use
dx run
to run the applet
Understanding wc
The wc
command takes one or more files as input. So that we have the same input file, please execute the following command to fetch the URL from Project Gutenberg and write the contents to the local file scarlet.txt:
Or use curl
:
By default, wc
will print the three columns showing the number of lines, words, and characters of text, in that order, followed by the name of the file:
The output from your version of wc
may differ slightly as there are several implementations of the program. For instance, the preceding output is on macOS, which is the BDS version, but the applet will run on Ubuntu Linux using the GNU version. Both programs work essentially the same.
The goal of this applet will be to accept a single file as input and capture the standard out (aka STDOUT
) of wc
to report the number of lines, words, and characters in the file.
Using dx-app-wizard
Next, you will create an applet that will accept this file as input, transfer it to a virtual machine, run wc
on the file, and return the preceding output as a new file. Run the dx-app-wizard
to interactively answer questions about the inputs, outputs, and runtime requirements. Start by executing the program with the -h|--help
flag to read the documentation:
As shown in the preceding usage, the name of the applet may be provided as an argument. For instance, you can run dx-app-wizard wc
to answer the first question, which is the name of the applet. Note the naming conventions for the applet name, which you should also follow for naming the input and output variables:
Because the name was provided as an argument, the prompt shows [wc]
. All the prompts will show a default value that will be used if you press the Enter key. If you wish to override this value, type a new name; otherwise, press Enter.
Next, you will be prompted for a title. The empty brackets ([]
) indicate this is optional, but I will provide "Word Count":
Likewise, the summary is optional, but I will provide one:
Indicate the version with major, minor, and patch release:
The input specification follows. Use the name input_file for the first input name and whatever label you like. For the class, choose file to indicate that the user must supply a valid file, and specify that this input is not optional:
As this is the only input, press Enter when prompted for a second input and move to the output specification. To start, call the output outfile and use the class of file:
There is no other output for now, so press Enter to move on to the Timeout Policy. You may choose any amount of time you like such as "1h" to indicate 1 hour:
Next, you will choose whether to use bash
or Python as the primary language of the applet. Choose bash
:
Choosing bash
means that your app will execute a bash
script that will use commands from the dxpy
module to do things like download and upload files as well as execute any command on the runtime instance, such as custom programs you write in Python, R, C, etc. Choosing Python
here means that a Python script will be executed, and it can use the same Python module to do everything the bash
script does. This tutorial will only demonstrate bash
apps. There is no advantage one language has over the other. You should choose whichever suits your tastes.
During runtime, some apps may need to fetch resources from the internet or from the parent project. Neither of these will apply to this applet, so answer "no" for the next two questions:
The wizard will finish with a listing of the files it has created:
As noted, you will find the following structure in the directory wc:
A directory for tests, mostly used internally by DNAnexus.
A directory for assets like files or binaries you would like copied to the rutime instance.
A JSON file describing the metadata for the applet.
A documentation stub you may wish to update.
Another documentation stub.
A directory to place source code for the applet.
The
bash
script template to execute the applet.
Inspecting dxapp.json
In the preceding step, the applet's inputs, outputs, and system requirements were written to the file dxapp.json, which is in JSON (JavaScript Object Notation) format. Open this file to inspect the contents, which begins with the basic metadata about the app:
The inputSpec section shows that this applet takes a single argument of the type file. Update the patterns to include .txt
:
The outputSpec shows that the applet will return a file:
The runSpec describes the runtime for the applet:
The default VM is Ubuntu 20.04, which includes Python v3 and R v3. You may also indicate Ubuntu 16.04, which has Python v2.
If you need Ubuntu 16.04 with Python v3, indicate version
1
here; otherwise, leave this0
.
The author has more success installing Python v2 on Ubuntu 20.04 rather than running an older Linux distro.
Finally, the regionalOptions describe the system requirements:
You may use a text editor to alter this file at any time, after which you will need to rebuild the applet.
Editing the Runtime Code
As indicated in runSpec, the applet will execute the bash
script src/wc.sh at runtime. The app wizard created a template that shows one method for download the input file and uploading the output file. Here is a modified version that removes most of the comments for the sake of brevity and adds the applet's business logic in the middle:
I've added this pragma to show each command as it's executed and to halt on undefined variables or failed system calls.
This will download the input file to a local file called input_file on the running instance.
Execute
wc
on input_file and redirect standard out to the file output.This will upload the result file called output from the instance back to the project.
This command will link the output file as an output of the applet.
The local variables $input_file
and $output
match the names used in the inputSpec and outputSpec. They will only exist at runtime.
Creating a Project for the Applet and Data
Applets and data must live inside a project, so create a new one either using the web interface or the command line by executing dx new project
:
Next, you will add the scarlet.txt file to the project. There are several ways you can do this. From the web interface, you can click the "Add" button that will show you options two relevant options:
"Upload Data": This will allow you to upload a file your local computer. You can drag and drop the file into the dialog box or use the file browser to select the file.
"Add Data From Server": This will launch an app that can import files accessible by a URL such as from a web address or FTP server. You should use the Project Gutenberg URL from earlier.
You can also use the dx upload
command. If you created the project using the web interface, you will first need to run dx select
to select your project:
Note the file's ID, which we will use later for the applet's input. If you use the web interface to upload, you can click the information "I" in the circle to see the file's metadata.
From the command line, you can use dx ls
with the -l|--long
option to see the file ID:
Building and Running The Applet
It's impossible to debug this program locally, so next you will build the applet and run it. If you are in the wc directory, run dx build
to build the applet; if you are in the directory above, run dx build wc
to indicate the directory that contains the applet. Subsequent builds will require the use of of the -f|--overwrite
or -a|--archive
flag to indicate what to do with the previous version. For consistency's sake, I always run with the -f
flag:
From the web interface, you can now view a web form that will allow you to execute the applet.
You do the same process that is listed in the Overview of that Platform section.
Running the Applet from the Command Line
You can also run the applet from the command line using the applet's ID. To begin, use dx run
with the -h|--help
flag to see the inputs and outputs of the applet:
Run the same command without the help flag to enter an interactive session where you can indicate the input file using the file's ID noted earlier:
You may also use specify the file on the command line:
Notice in both instances, the input is formatted as a JSON document for submission. Copy that JSON into a file with the following contents:
Use this file as the -f|--file
input for the applet along with the -y
flag to indicate you want to proceed without further confirmation and the --watch
flag to enter into a watch of the applet's progress:
The end of the job's output should look like the following:
Run dx describe
on the indicated output file ID to see the metadata about the file. Then execute dx cat
to see the contents of the file, which should be the same results as when the program ran locally:
Review
In this chapter, you did the following:
Learned the structure of a native bash and how to use the wizard to create a new app
Built an app and ran it from the command line and the web interface
Inspected the output of an applet
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?