Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Getting Started
  • Python Template
  • Debugging Locally
  • Review
  • Resources

Was this helpful?

Export as PDF
  1. Building Applets
  2. Python

Example 1: Word Count (wc)

In this example, you will:

  • Learn to write a native DNAnexus applet that executes a Python program

  • Use the dxpy module to download and upload files

  • Use the Python subprocess module to execute an external process and check the return value

Getting Started

We'll use the same scarlet.txt file from the bash version of the wc applet. Start off using dx-app-wizard and define the same inputs and outputs as before, but be sure to choose Python for the Programming language:

Template Options

You can write your app in any programming language, but we provide
templates for the following supported languages: Python, bash
Programming language: Python

Python Template

The Python template looks like the following:

python_wc.py
#!/usr/bin/env python
# python_wc 0.1.0
# Generated by dx-app-wizard.
#
# Basic execution pattern: Your app will run on a single machine from
# beginning to end.
#
# See https://documentation.dnanexus.com/developer for documentation and
# tutorials on how to modify this file.
#
# DNAnexus Python Bindings (dxpy) documentation:
#   http://autodoc.dnanexus.com/bindings/python/current/

import os
import dxpy

@dxpy.entry_point('main') # 1
def main(input_file): # 2

    # The following line(s) initialize your data object inputs on the platform
    # into dxpy.DXDataObject instances that you can start using immediately.

    input_file = dxpy.DXFile(input_file) # 3

    # The following line(s) download your file inputs to the local file system
    # using variable names for the filenames.

    dxpy.download_dxfile(input_file.get_id(), "input_file") # 4

    # Fill in your application code here.

    # The following line(s) use the Python bindings to upload your file outputs
    # after you have created them on the local file system.  It assumes that you
    # have used the output field name for the filename for each output, but you
    # can change that behavior to suit your needs.

    outfile = dxpy.upload_local_file("outfile") # 5

    # The following line fills in some basic dummy output and assumes
    # that you have created variables to represent your output with
    # the same name as your output fields.

    output = {}
    output["outfile"] = dxpy.dxlink(outfile) # 6

    return output # 7

dxpy.run()
  1. The input_file listed in the inputSpec is passed to main.

  2. Download the input file.

  3. Upload the local output file.

  4. Add the DX file ID to the output dictionary.

  5. Return the output

Update src/python_wc.py to the following:

python_wc.py
#!/usr/bin/env python

import dxpy
import sys
from subprocess import getstatusoutput # 1


@dxpy.entry_point("main")
def main(input_file):
    local_file = "input_file.txt" # 2
    output_file = "output.txt" # 3

    input_file = dxpy.DXFile(input_file) # 4
    dxpy.download_dxfile(input_file.get_id(), local_file) # 5

    rv, out = getstatusoutput(f"wc {local_file} > {output_file}") # 6

    if rv != 0: # 7
        sys.exit(out)

    outfile = dxpy.upload_local_file(output_file) # 8
    return {"outfile": dxpy.dxlink(outfile)} # 9


dxpy.run()
  1. Use the local filename input_file.txt.

  2. The output file will be called output.txt.

  3. Execute wc on the local input file and redirect (>) the output to the chosen output filename. This function returns a tuple containing the process's return value and output (STDOUT/STDERR).

  4. If the program makes it to this point, the output file should have been created to upload.

  5. Return a Python dictionary with the DNAnexus link to the new outfile object.

NOTE: Portable Operating System Interface (POSIX) standards dictate that processes return 0 on success (i.e., zero errors) and some positive integer value (usually in the range 1-127) to indicate an error condition.

Run dx build to build the applet. Create an job_input.json file with the file ID of your input:

{
    "input_file": {
        "$dnanexus_link": "file-GgGX7Y8071x46B02JGb515pB"
    }
}

Run your applet with the input file using --watch to see the output:

$ dx run applet-GgGX740071xJY20Gjkp0JYXB -f python_wc/job_input.json \
    -y --watch \
    --destination project-GXY0PK0071xJpG156BFyXpJF:/output/python_wc/
Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GgGX7Y8071x46B02JGb515pB"
    }
}

Calling applet-GgGX740071xJY20Gjkp0JYXB with output destination
  project-GXY0PK0071xJpG156BFyXpJF:/output/python_wc

Job ID: job-GgGX8P0071x1yfFPkJ8662gQ

Job Log
-------
Watching job job-GgGX8P0071x1yfFPkJ8662gQ. Press Ctrl+C to stop watching.
* Python implementation of wc (python_wc:main) (running) job-GgGX8P0071x1yfFPkJ8662gQ
  kyclark 2024-02-23 16:03:24 (running for 0:01:39)
2024-02-23 16:11:36 Python implementation of wc INFO Logging initialized (priority)
2024-02-23 16:11:36 Python implementation of wc INFO Logging initialized (bulk)
2024-02-23 16:11:40 Python implementation of wc INFO Setting SSH public key
2024-02-23 16:11:42 Python implementation of wc STDOUT dxpy/0.369.0 (Linux-5.15.0-1053-aws-x86_64-with-glibc2.29) Python/3.8.10
2024-02-23 16:11:43 Python implementation of wc STDOUT Invoking main with {'input_file': {'$dnanexus_link': 'file-GgGX7Y8071x46B02JGb515pB'}}
* Python implementation of wc (python_wc:main) (done) job-GgGX8P0071x1yfFPkJ8662gQ
  kyclark 2024-02-23 16:03:24 (runtime 0:01:36)
  Output: outfile = file-GgGXGFj0FbZxjvk1jZPJPkG2

I can inspect the contents of the output file:

$ dx cat file-GgGXGFj0FbZxjvk1jZPJPkG2
  8596  86049 513778 input_file.txt

I can verify this is correct by piping the input file to a local execution of wc:

$ dx cat file-GgGX7Y8071x46B02JGb515pB | wc
    8596   86049  513778

Debugging Locally

You can shorten the build/run development cycle by naming the JSON input job_input.json and executing the Python program locally:

$ python3 src/python_wc.py
Invoking main with {'input_file': {'$dnanexus_link': 'file-GgGX7Y8071x46B02JGb515pB'}}

This will download the input as input_file.txt and then create a new local file with the system call:

$ cat output.txt
    8596   86049  513778 input_file.txt

Review

  • You have now translated the bash applet for running wc into a native DNAnexus Python applet.

  • You were introduced to the dxpy module that provides functions for making API calls.

  • You used subprocess.getstatusoutput to call an external process and interpret the return value for success or failure.

In the next section, we'll continue translating bash to Python.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousPythonNextExample 2: fastq_quality_trimmer

Last updated 4 months ago

Was this helpful?

: DNAnexus execution environment entry point

Create a object.

Import the function.

Shadow the input_file variable, overwriting it with the creation of a new object.

Call to download the input file identified by the file ID to the local_file name.

If the return value is not zero, use to abort the program with the output from the system call.

entry_point
DXFile
subprocess.getstatusoutput
DXFile
dxpy.download_dxfile
sys.exit
Full Documentation