# Example 2: fastq\_quality\_trimmer

In this exercise, we'll demonstrate a native DNAnexus Python applet that runs the `fastq_quality_trimmer` binary.

You will learn:

* How to use a `DXFile` object to get file metadata
* How to use Python functions to choose an output filename using the input file's name
* How to add debugging output to your Python program

## Getting Started

The inputs and outputs are the same as in the `bash` version of this applet. You can start from scratch using `dx-app-wizard` with the following input specs:

| Input Name      | Type   | Optional | Default Value |
| --------------- | ------ | -------- | ------------- |
| `input_file`    | `file` | No       | NA            |
| `quality_score` | `file` | Yes      | 30            |

The output specs are as follows:

| Output Name   | Type   |
| ------------- | ------ |
| `output_file` | `file` |

Or you can use the *dxapp.json* from the `bash` version and change the `runSpec` `file` to the name of your Python script and the `interpreter` to `python3` as follows:

```json
    "runSpec": {
        "timeoutPolicy": {
            "*": {
                "hours": 1
            }
        },
        "interpreter": "python3",
        "file": "src/python_fastq_trimmer.py",
        "distribution": "Ubuntu",
        "release": "24.04",
        "version": "0"
    },
```

Inside your applet's source code, create *resources/usr/local/bin* and copy the `fastq_quality_trimmer` bin to this location. At runtime, the binary will be available at */usr/local/bin/fastq\_quality\_trimmer*, which is in the standard `$PATH`.

## Python Code

Update the Python code to the following:

{% code title="python\_fastq\_trimmer.py" overflow="wrap" lineNumbers="true" %}

```python
#!/usr/bin/env python3

import dxpy
import os
import sys
from subprocess import getstatusoutput


@dxpy.entry_point("main")
def main(input_file, quality_score): # 1
    input_file = dxpy.DXFile(input_file)
    desc = input_file.describe() # 2
    local_file = desc.get("name", input_file.get_id()) # 3
    dxpy.download_dxfile(input_file.get_id(), local_file)  # 4

    basename, ext = os.path.splitext(local_file) # 5
    outfile = f"{basename}.filtered{ext}" # 6
    cmd = ( # 7
        f"fastq_quality_trimmer -Q 33 -t {quality_score} "
        f"-i {local_file} -o {outfile}"
    )
    print(cmd) # 8
    rv, out = getstatusoutput(cmd) # 9

    if rv != 0:
        sys.exit(out)

    dx_output_file = dxpy.upload_local_file(outfile) # 10
    return {"output_file": dxpy.dxlink(dx_output_file)}


dxpy.run()
```

{% endcode %}

1. The `input_file` will be the DNAnexus file ID (e.g., `file-FvQGZb00bvyQXzG3250XGbgz`), and the `quality_score` will be an integer value.
2. Use [`DXFile.describe`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_bindings.html#dxpy.bindings.DXDataObject.describe) to get a Python dictionary of metadata.
3. Choose a local filename by using either the file's `name` from the metadata or the file ID.
4. Download the input file to the chosen local filename.
5. Split the filename into a basename and extension.
6. Create an output filename using the input basename and a new extension to indicate that the data has been filtered.
7. Format a command string.
8. Print the command for debugging purposes.
9. Execute the command and check the return value.
10. If the code makes it to this point, upload the output file and return the file ID to be attached to the job's output.

## Build and Run

Run `dx build` in your source directory to create the new applet. Use the new applet ID to execute the applet with a small FASTQ file:

```bash
$ dx run applet-GgKQ5qQ071x5yX7fgbq3PkXB \
> -f python_fastq_trimmer/job_input.json -y --watch \
> --destination project-GXY0PK0071xJpG156BFyXpJF:/output/python_fastq_trimmer/

Using input JSON:
{
    "input_file": {
        "$dnanexus_link": "file-FvQGZb00bvyQXzG3250XGbgz"
    },
    "quality_score": 28
}

Calling applet-GgKQ5qQ071x5yX7fgbq3PkXB with output destination
  project-GXY0PK0071xJpG156BFyXpJF:/output/python_fastq_trimmer

Job ID: job-GgKQ6x0071x6kf34P5xy2q2b

Job Log
-------
Watching job job-GgKQ6x0071x6kf34P5xy2q2b. Press Ctrl+C to stop watching.
* Python version of fastq_trimmer (python_fastq_trimmer:main) (running)
* job-GgKQ6x0071x6kf34P5xy2q2b
  kyclark 2024-02-26 14:32:36 (running for 0:00:21)
2024-02-26 14:33:17 Python version of fastq_trimmer INFO Logging initialized
(priority)
2024-02-26 14:33:17 Python version of fastq_trimmer INFO Logging initialized
(bulk)
2024-02-26 14:33:21 Python version of fastq_trimmer INFO Downloading bundled
file resources.tar.gz
2024-02-26 14:33:22 Python version of fastq_trimmer STDOUT >>> Unpacking
resources.tar.gz to /
2024-02-26 14:33:22 Python version of fastq_trimmer STDERR tar: Removing
leading `/' from member names
2024-02-26 14:33:22 Python version of fastq_trimmer INFO Setting SSH public key
2024-02-26 14:33:23 Python version of fastq_trimmer STDOUT dxpy/0.369.0
(Linux-5.15.0-1053-aws-x86_64-with-glibc2.29) Python/3.8.10
2024-02-26 14:33:23 Python version of fastq_trimmer STDOUT Invoking main with
{'input_file': {'$dnanexus_link': 'file-FvQGZb00bvyQXzG3250XGbgz'},
'quality_score': 28}
2024-02-26 14:33:24 Python version of fastq_trimmer STDOUT
fastq_quality_trimmer -Q 33 -t 28 -i small-celegans-sample.fastq -o
small-celegans-sample.filtered.fastq
* Python version of fastq_trimmer (python_fastq_trimmer:main) (done)
* job-GgKQ6x0071x6kf34P5xy2q2b
  kyclark 2024-02-26 14:32:36 (runtime 0:00:20)
  Output: output_file = file-GgKQ79j0B2FQjGbk0qX6j64B
```

## Verify Ouput

Use `dx head` to verify the output looks like a FASTQ file:

```bash
$ dx head file-GgKQ79j0B2FQjGbk0qX6j64B
@SRR070372.1 FV5358E02GLGSF length=78
TTTTTTTTTTTTTTTTTTTTTTTTTTTNTTTNTTTNTTTNTTTATTTATTTATTTATTATTATATATATATA
+SRR070372.1 FV5358E02GLGSF length=78
...000//////999999<<<=<<666!602!777!922!688:669A9=<=122569AAA?>@BBBBAA?=
@SRR070372.2 FV5358E02FQJUJ length=177
TTTCTTGTAATTTGTTGGAATACGAGAACATCGTCAATAATATATCGTATGAATTGAACCACACGGCACATATTTGAACTTGTTCGTGAAATTTAGCGAACCTGGCAGGACTCGAACCTCCAATCTTCGGATCCGAAGTCCGACGCCCCCGCGTCGGATGCGTTGTTACCACTGCTT
+SRR070372.2 FV5358E02FQJUJ length=177
222@99912088>C<?7779@<GIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC;6666IIIIIIIIIIII;;;HHIIE>944=>=;22499;CIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIH?;;;?IIEEEEEEEEIIII77777I7EEIIEEHHHHHIIIIIIIIIIIIII
@SRR070372.3 FV5358E02GYL4S length=70
TTGGTATCATTGATATTCATTCTGGAGAACGATGGAACATACAAGAATTGTGTTAAGACCTGCAT
```

To verify that the applet did winnow the number of reads, I can pipe the output of `dx cat` to `wc` to verify that the output file has fewer lines than the input file:

```bash
$ dx cat file-GgKQ79j0B2FQjGbk0qX6j64B | wc -l
   99952

$ dx cat file-FvQGZb00bvyQXzG3250XGbgz | wc -l
  100000
```

## Review

* You used `DXFile` to get the input file's name
* Your output filename is based on the input file's name rather than a static name like *output.txt*.
* You can call Python's `print` function to add your own STDOUT/STDERR to the applet, which can be an aid in debugging your program.

## Resources

[Full Documentation](https://documentation.dnanexus.com/)

To create a support ticket if there are technical issues:

1. Go to the Help header (same section where Projects and Tools are) inside the platform
2. Select "Contact Support"
3. Fill in the Subject and Message to submit a support ticket.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.dnanexus.com/buildingapplets/python/python_fastq_trimmer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
