JSON on the Platform

Be sure to install jq.

Background

JavaScript Object Notation (JSON) is a data exchange format designed to be easy for humans and machines to read. You will encounter JSON several places on the DNAnexus platform such as when you create and edit native applets and workflows. As shown in Figure 1, JSON is used to communicate with the DNAnexus Application Programming Interface (API) You will need to understand the responses from the API will help you debug applets, find failed jobs, and relaunch analyses.

JSON Examples

Here is an example of an objects inside other objects describing the output of the FastQC app that creates two files as outputs, one of an HTML report and the other of a text file containing statistics on the input FASTQ:

{
   "report_html": {
       "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
   },
   "stats_txt": {
       "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
   }
}

In a later chapter, you will use a file called dxapp.json to build custom applets on DNAnexus. To see a full example from a working app, run dx get app-fastqc to download the source code for the FastQC app. This should create a fastqc directory that contains the file dxapp.json.

Following is a portion of this file showing a typical JSON document you'll encounter on DNAnexus:

{
    "name": "fastqc",
    "title": "FastQC Reads Quality Control",
    "summary": "Generates a QC report on reads data",
    "dxapi": "1.0.0",
    "openSource": true,
    "version": "3.0.3",
    "inputSpec": [
        {
            "name": "reads",
            "label": "Reads",
            "help": "A file containing the reads to be checked. Accepted formats are gzipped-FASTQ and BAM.",
            "class": "file",
            "patterns": [
                "*.fq.gz",
                "*.fastq.gz",
                "*.sam",
                "*.bam"
            ]
        },
    ...
}

The root element of this JSON document is an object, as denoted by the curly brackets.
The value of inputSpec is a list, as denoted by the square brackets.
Each value in the list is another object.
The first three values of this object are strings.
The patterns value is a list of strings representing file globs that match the input file extensions.

The following links explain the dxapp.json file in greater detail:

Validating JSON

JSON is a strict format that is easy to get wrong if you are manually editing a file. For this reason, we suggest you use text editors that understand JSON syntax, highlight data structures, and spot common mistakes. For instance, a JSON object looks very similar to a Python dictionary, which allows a trailing comma in a list. Open the python3 REPL (read-evaluate-print-loop) and enter the following to verify:

>>> { 'patterns': [ '*.bam', '*.sam', ] }
{'patterns': ['*.bam', '*.sam']}

A similar trailing comma in JSON would make the document invalid. To see this, go to JSONlint.com, paste this into the input box, and press the "Validate JSON" button:

{ "patterns": [ "*.bam", "*.sam", ] }

The result should reformat the JSON onto three lines as follows:

{
    "patterns": ["*.bam", "*.sam", ]
}

The second line should be highlighted in red, and the "Results" below show that a JSON value is expected after the last comma and before the closing square bracket.

Error: Parse error on line 2:
... ["*.bam", "*.sam", ]}
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got ']'

Remove the offending comma and revalidate the document to see the "Results" change to "Valid JSON." You may also want to install a command-line tool like jsonlint that can show similar errors:

$ jsonlint dxapp.json
Error: Parse error on line 15:
...*.sam",            ],            "help
----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got ']'

Viewing JSON

JSON is not dependent on whitespace, so the previous example could be compressed to the following:

$ cat minified.json
{"report_html":{"dnanexus_link":"file-G4x7GX80VBzQy64k4jzgjqgY"},"stats_txt":
{"dnanexus_link":"file-G4x7GXQ0VBzZxFxz4fqV120B"}}

The jq program will format JSON into an indented data structure that is easier to read. In the following example, we execute jq with the filter . to indicate we wish to see the entire document, which is the last argument. Depending on your terminal, the keys may be shown in one color and the values in a different color:

$ jq . minified.json
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  }
}

The power of jq lies in the filter argument, which allows you to extract and manipulate the contents of the document. Use the filter .report_html to extract the value for key report_html that lies at the root of the document:

$ jq .report_html example.json
{
  "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
}

::: note If you request a key that does not exist, you will get the JavaScript value null, indicating no value is present: :::

$ jq .report_htm example.json
null

Filters may chain keys to search further into the document structure. In the following example, we can extract the file identifier by chaining .report_html.dnanexus_link:

$ jq .report_html.dnanexus_link example.json
"file-G4x7GX80VBzQy64k4jzgjqgY"

Reading from Unix Pipes

Unix-type operating systems such as Linux and FreeBSD/macOS have three special filehandles:

STDIN (standard in)
STDOUT (standard out)
STDERR (standard error)

STDOUT and STDERR control the output of programs where the first is usually the console and the second is an error channel to segregate errors from regular output. For instance, the STDOUT of jq can be redirected to a file using the > operator:

$ jq . minified.json > prettified.json
$ cat prettified.json
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  }
}

STDIN is an input filehandle created by using a pipe (|) in the following example:

$ cat minified.json | jq .
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  }
}

Alternatively, you can read from an input redirect using <:

$ jq . < example.json
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  }
}

Using jq For DNAnexus Responses

Many dx commands can return JSON by appending the --json flag to them. For instance, dx describe app-fastqc will return a table of metadata about the FastQC app. In the following example, I will request the same data as JSON and will pipe it into the head program to see the first 10 lines:

$ dx describe app-fastqc --json | head
{
    "id": "app-G81jg5j9jP7qxb310vg2xQkX",
    "class": "app",
    "billTo": "org-dnanexus_apps",
    "created": 1644399511000,
    "modified": 1644401066806,
    "createdBy": "user-jkotrs",
    "name": "fastqc",
    "version": "3.0.3",
    "aliases": [

As with previous examples, the result is a JSON document with an object at the root level; therefore, I can pipe the output into jq .id to extract the app identifier:

$ dx describe app-fastqc --json | jq .id
"app-G81jg5j9jP7qxb310vg2xQkX"

I can use dx find projects --public to view a list of public projects. Using head, I can see the root of the JSON is a list:

$ dx find projects --public --json | head
[
    {
        "id": "project-F0yyz6j9Jz8YpxQV8B8Kk7Zy",
        "level": "VIEW",
        "permissionSources": [
            "PUBLIC"
        ],
        "public": true,
        "describe": {
            "id": "project-F0yyz6j9Jz8YpxQV8B8Kk7Zy",

The jq filter .[] will iterate over the values of a list at the root, so I can use .[].id in the following command to extract the project identifier of each. As this returns over 100 results, I'll use head to show the first few lines:

$ dx find projects --public --json | jq ".[].id" | head -3
"project-F0yyz6j9Jz8YpxQV8B8Kk7Zy"
"project-G4FX3QXKzJxqXxGpK2pJ7Z3K"
"project-FGX8gVQB9X7K5f1pKfPvz9yG"

You can also use pipes inside of the jq filter to extract the same data:

$ dx find projects --public --json | jq ".[] | .id" | head -n 3
"project-F0yyz6j9Jz8YpxQV8B8Kk7Zy"
"project-G4FX3QXKzJxqXxGpK2pJ7Z3K"
"project-FGX8gVQB9X7K5f1pKfPvz9yG"

Recipes for Using jq

Editing Job Input and Rerunning

You may wish to re-run an analysis, possibly with slightly different inputs. For this example, I'll use the job.json file rather than using the pipe

$ jq .input job.json
{
  "reads": {
    "$dnanexus_link": "file-BQbXKk80fPFj4Jbfpxb6Ffv2"
  },
  "format": "auto",
  "kmer_size": 7,
  "nogroup": true
}

Redirect this to a file:

$ jq .input job.json > input.json

::: note If you had access to the original job ID, you would run the following: :::

$ dx describe job-G4x7G5j0B3K2FKzgP654ZqpK --json | jq .input > input.json

Edit the input.json file, perhaps to indicate a different kmer_size, then re-run the app using the new input:

$ dx run app-G4YyQ9044b90F1vG8y9YkKk3 -f input.json

Finding Failed Jobs

Sometimes I find jobs that some jobs have failed when processing large batches of data. I can use dx find jobs --state failed to return a list of failed jobs that I might see if the input files were corrupt or were especially large, causing the instances to run out of disk space or memory. First, I'll show you how to use more advanced filtering in jq. The file jobs.json shows example output from dx find jobs --json that I'll use to extract the state of the jobs:

$ jq ".[].state" rap-jobs.json | sort | uniq -c | sort -rn
  15 "failed"
   3 "done"
   2 "terminated"

A select statement in jq can find the "failed" jobs, and pipes join to more filters to extract the job IDs and the app IDs:

$ jq '.[] | select (.state | contains("failed")) | .id, .executable' rap-jobs.json | head
"job-G6jj9k8JPXfG42094KG5JFX4"
"applet-G6jj9b0JPXf5Q6ZF4G85K156"
"job-G6jj1zQJPXf34z8v4KqjZKP1"
"applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp"
"job-G6jg9vQJPXfGbJb54GFkJ33Y"
"applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp"
"job-G6jg7Y0JPXfG6q53G12vQZK8"
"applet-G6jg6pQJPXf7ypXq33B75Qq1"
"job-G6jg57QJPXf90Jjv4K8pgkG7"
"applet-G6jfg90JPXfGZkVb7PPxjpPY"

To be useful in a bash loop, I need the job and app IDs on the same line, so I can use paste for this:

$ jq '.[] | select (.state | contains("failed")) | .id, .executable' rap-jobs.json | paste - -
"job-G6jj9k8JPXfG42094KG5JFX4"  "applet-G6jj9b0JPXf5Q6ZF4G85K156"
"job-G6jj1zQJPXf34z8v4KqjZKP1"  "applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp"
"job-G6jg9vQJPXfGbJb54GFkJ33Y"  "applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp"
"job-G6jg7Y0JPXfG6q53G12vQZK8"  "applet-G6jg6pQJPXf7ypXq33B75Qq1"
"job-G6jg57QJPXf90Jjv4K8pgkG7"  "applet-G6jfg90JPXfGZkVb7PPxjpPY"
"job-G6jZk6jJPXf1q1Py5VKX6gJK"  "applet-G6jZjG0JPXf7ZxZP4G5v0X1k"
"job-G6jYY28JPXfFvFXY4GXB6jG2"  "applet-G6jYXq0JPXf5Q6ZF4G85JVgG"
"job-G6jY9FQJPXf3pj894GFJ02jy"  "applet-G6jY7zQJPXfG42094KG5Gkyy"
"job-G6jY858JPXfBKX1X0j434BY5"  "applet-G6jY7zQJPXfG42094KG5Gkyy"
"job-G6jY740JPXf7V2vJ4G2Gkfj7"  "applet-G6jY6zQJPXf81J984K6kfB3V"
"job-G6jY5v8JPXfPGQq15k77zPJ9"  "applet-G6jY5jjJPXf6Ffqg4GqF4KPg"
"job-G6jY4k0JPXfPGQq15k77zP9Q"  "applet-G6jY39jJPXfG42094KG5GkV9"
"job-G6jXPJQJPXfBbf694G3Fg07K"  "applet-G6jXJJjJPXf7V2vJ4G2GkFbF"
"job-G6jX7yQJPXfFjzffKJzpqfj7"  "applet-G6jX7JQJPXf3V99x4Gx7K09X"
"job-G6jVzJ0JPXf5Q6ZF4G85JG09"  "applet-G6jVxQQJPXfGZ0BF33KZfX5Y"

If I had access to the original executions and input files, I could use a bash loop to re-run these jobs. Since I don't, I'll echo the command that should be run:

jq '.[] | select (.state | contains("failed")) | .id, .executable' \
rap-jobs.json | paste - - | \
while read JOB_ID APP_ID; do echo dx run $APP_ID --clone $JOB_ID; done

This produces the following output:

dx run "applet-G6jj9b0JPXf5Q6ZF4G85K156" --clone "job-G6jj9k8JPXfG42094KG5JFX4"
dx run "applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp" --clone "job-G6jj1zQJPXf34z8v4KqjZKP1"
dx run "applet-G6jg9p8JPXf4Q9Pb4GgPK8Vp" --clone "job-G6jg9vQJPXfGbJb54GFkJ33Y"
dx run "applet-G6jg6pQJPXf7ypXq33B75Qq1" --clone "job-G6jg7Y0JPXfG6q53G12vQZK8"
dx run "applet-G6jfg90JPXfGZkVb7PPxjpPY" --clone "job-G6jg57QJPXf90Jjv4K8pgkG7"
dx run "applet-G6jZjG0JPXf7ZxZP4G5v0X1k" --clone "job-G6jZk6jJPXf1q1Py5VKX6gJK"
dx run "applet-G6jYXq0JPXf5Q6ZF4G85JVgG" --clone "job-G6jYY28JPXfFvFXY4GXB6jG2"
dx run "applet-G6jY7zQJPXfG42094KG5Gkyy" --clone "job-G6jY9FQJPXf3pj894GFJ02jy"
dx run "applet-G6jY7zQJPXfG42094KG5Gkyy" --clone "job-G6jY858JPXfBKX1X0j434BY5"
dx run "applet-G6jY6zQJPXf81J984K6kfB3V" --clone "job-G6jY740JPXf7V2vJ4G2Gkfj7"
dx run "applet-G6jY5jjJPXf6Ffqg4GqF4KPg" --clone "job-G6jY5v8JPXfPGQq15k77zPJ9"
dx run "applet-G6jY39jJPXfG42094KG5GkV9" --clone "job-G6jY4k0JPXfPGQq15k77zP9Q"
dx run "applet-G6jXJJjJPXf7V2vJ4G2GkFbF" --clone "job-G6jXPJQJPXfBbf694G3Fg07K"
dx run "applet-G6jX7JQJPXf3V99x4Gx7K09X" --clone "job-G6jX7yQJPXfFjzffKJzpqfj7"
dx run "applet-G6jVxQQJPXfGZ0BF33KZfX5Y" --clone "job-G6jVzJ0JPXf5Q6ZF4G85JG09"

If you were using dx find jobs, then the equivalent would be this:

dx find jobs --state failed --json | jq '.[] | .id, .executable' | paste - - | \
while read JOB_ID APP_ID; do echo dx run $APP_ID --clone $JOB_ID; done

Review

You should now be able to:

Describe how users interact with the DNAnexus Platform
Explain the purpose of using JSON on the DNAnexus platform
Articulate the basic elements of JSON
Describe and read basic JSON structures on the platform
Parse JSON responses from the platform using jq and pipes to other filters or Unix programs

Helpful Tips

Learn the dxapp.json specification
Use an Editor like Visual Studio Code with JSON Crack plugin
Use JSON checking tools to make sure your JSON is well formed
- https://jsonlint.com/
- Run through jq
Use dx get to get app code and dxapp.json for an existing app

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

PreviousIntroduction NextCommand Line Interface (CLI)

Last updated 6 months ago

Was this helpful?