# Example 1: Word Count (wc)

In this example, you will:

* Learn to write a native DNAnexus applet that executes a Python program
* Use the `dxpy` module to download and upload files
* Use the Python `subprocess` module to execute an external process and check the return value

## Getting Started

We'll use the same *scarlet.txt* file from the `bash` version of the `wc` applet. Start off using `dx-app-wizard` and define the same inputs and outputs as before, but be sure to choose *Python* for the *Programming language*:

```
Template Options

You can write your app in any programming language, but we provide
templates for the following supported languages: Python, bash
Programming language: Python
```

## Python Template

The Python template looks like the following:

{% code title="python\_wc.py" overflow="wrap" lineNumbers="true" %}

```python
#!/usr/bin/env python
# python_wc 0.1.0
# Generated by dx-app-wizard.
#
# Basic execution pattern: Your app will run on a single machine from
# beginning to end.
#
# See https://documentation.dnanexus.com/developer for documentation and
# tutorials on how to modify this file.
#
# DNAnexus Python Bindings (dxpy) documentation:
#   http://autodoc.dnanexus.com/bindings/python/current/

import os
import dxpy

@dxpy.entry_point('main') # 1
def main(input_file): # 2

    # The following line(s) initialize your data object inputs on the platform
    # into dxpy.DXDataObject instances that you can start using immediately.

    input_file = dxpy.DXFile(input_file) # 3

    # The following line(s) download your file inputs to the local file system
    # using variable names for the filenames.

    dxpy.download_dxfile(input_file.get_id(), "input_file") # 4

    # Fill in your application code here.

    # The following line(s) use the Python bindings to upload your file outputs
    # after you have created them on the local file system.  It assumes that you
    # have used the output field name for the filename for each output, but you
    # can change that behavior to suit your needs.

    outfile = dxpy.upload_local_file("outfile") # 5

    # The following line fills in some basic dummy output and assumes
    # that you have created variables to represent your output with
    # the same name as your output fields.

    output = {}
    output["outfile"] = dxpy.dxlink(outfile) # 6

    return output # 7

dxpy.run()
```

{% endcode %}

1. [entry\_point](http://autodoc.dnanexus.com/bindings/python/current/dxpy_utils.html#dxpy.utils.exec_utils.entry_point): DNAnexus execution environment entry point
2. The `input_file` listed in the `inputSpec` is passed to `main`.
3. Create a [DXFile](http://autodoc.dnanexus.com/bindings/python/current/dxpy_dxfile.html#dxpy.bindings.dxfile.DXFile) object.
4. Download the input file.
5. Upload the local output file.
6. Add the DX file ID to the `output` dictionary.
7. Return the `output`

Update *src/python\_wc.py* to the following:

{% code title="python\_wc.py" overflow="wrap" lineNumbers="true" %}

```python
#!/usr/bin/env python

import dxpy
import sys
from subprocess import getstatusoutput # 1


@dxpy.entry_point("main")
def main(input_file):
    local_file = "input_file.txt" # 2
    output_file = "output.txt" # 3

    input_file = dxpy.DXFile(input_file) # 4
    dxpy.download_dxfile(input_file.get_id(), local_file) # 5

    rv, out = getstatusoutput(f"wc {local_file} > {output_file}") # 6

    if rv != 0: # 7
        sys.exit(out)

    outfile = dxpy.upload_local_file(output_file) # 8
    return {"outfile": dxpy.dxlink(outfile)} # 9


dxpy.run()
```

{% endcode %}

1. Import the [`subprocess.getstatusoutput`](https://docs.python.org/3/library/subprocess.html#subprocess.getstatusoutput) function.
2. Use the local filename *input\_file.txt*.
3. The output file will be called *output.txt*.
4. *Shadow* the `input_file` variable, overwriting it with the creation of a new [`DXFile`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_dxfile.html#dxpy.bindings.dxfile.DXFile) object.
5. Call [`dxpy.download_dxfile`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_dxfile.html#dxpy.bindings.dxfile_functions.download_dxfile) to download the input file identified by the file ID to the `local_file` name.
6. Execute `wc` on the local input file and redirect (`>`) the output to the chosen output filename. This function returns a tuple containing the process's return value and output (STDOUT/STDERR).
7. If the return value is not zero, use [`sys.exit`](https://docs.python.org/3/library/sys.html#sys.exit) to abort the program with the output from the system call.
8. If the program makes it to this point, the output file should have been created to upload.
9. Return a Python dictionary with the DNAnexus link to the new `outfile` object.

NOTE: Portable Operating System Interface (POSIX) standards dictate that processes return `0` on success (i.e., zero errors) and some positive integer value (usually in the range 1-127) to indicate an error condition.

Run `dx build` to build the applet. Create an *job\_input.json* file with the file ID of your input:

```
{
    "input_file": {
        "$dnanexus_link": "file-GgGX7Y8071x46B02JGb515pB"
    }
}
```

Run your applet with the input file using `--watch` to see the output:

```
$ dx run applet-GgGX740071xJY20Gjkp0JYXB -f python_wc/job_input.json \
    -y --watch \
    --destination project-GXY0PK0071xJpG156BFyXpJF:/output/python_wc/
Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-GgGX7Y8071x46B02JGb515pB"
    }
}

Calling applet-GgGX740071xJY20Gjkp0JYXB with output destination
  project-GXY0PK0071xJpG156BFyXpJF:/output/python_wc

Job ID: job-GgGX8P0071x1yfFPkJ8662gQ

Job Log
-------
Watching job job-GgGX8P0071x1yfFPkJ8662gQ. Press Ctrl+C to stop watching.
* Python implementation of wc (python_wc:main) (running) job-GgGX8P0071x1yfFPkJ8662gQ
  kyclark 2024-02-23 16:03:24 (running for 0:01:39)
2024-02-23 16:11:36 Python implementation of wc INFO Logging initialized (priority)
2024-02-23 16:11:36 Python implementation of wc INFO Logging initialized (bulk)
2024-02-23 16:11:40 Python implementation of wc INFO Setting SSH public key
2024-02-23 16:11:42 Python implementation of wc STDOUT dxpy/0.369.0 (Linux-5.15.0-1053-aws-x86_64-with-glibc2.29) Python/3.8.10
2024-02-23 16:11:43 Python implementation of wc STDOUT Invoking main with {'input_file': {'$dnanexus_link': 'file-GgGX7Y8071x46B02JGb515pB'}}
* Python implementation of wc (python_wc:main) (done) job-GgGX8P0071x1yfFPkJ8662gQ
  kyclark 2024-02-23 16:03:24 (runtime 0:01:36)
  Output: outfile = file-GgGXGFj0FbZxjvk1jZPJPkG2
```

I can inspect the contents of the output file:

```
$ dx cat file-GgGXGFj0FbZxjvk1jZPJPkG2
  8596  86049 513778 input_file.txt
```

I can verify this is correct by piping the input file to a local execution of `wc`:

```
$ dx cat file-GgGX7Y8071x46B02JGb515pB | wc
    8596   86049  513778
```

## Debugging Locally

You can shorten the `build`/`run` development cycle by naming the JSON input *job\_input.json* and executing the Python program locally:

```
$ python3 src/python_wc.py
Invoking main with {'input_file': {'$dnanexus_link': 'file-GgGX7Y8071x46B02JGb515pB'}}
```

This will download the input as *input\_file.txt* and then create a new local file with the system call:

```
$ cat output.txt
    8596   86049  513778 input_file.txt
```

## Review

* You have now translated the `bash` applet for running `wc` into a native DNAnexus Python applet.
* You were introduced to the `dxpy` module that provides functions for making API calls.
* You used `subprocess.getstatusoutput` to call an external process and interpret the return value for success or failure.

In the next section, we'll continue translating `bash` to Python.

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.
