Cloud Computing for HPC Users

HPC vs the DNAnexus Platform

Component

HPC

DNAnexus Platform

Driver/ Requestor

Head Node of Cluster

API Server

Submission Script Language

Portable Bash System (PBS) or SLURM

dx-toolkit

Worker

Requested from pool of machines in private cluster

requested from pool of machines in AWS/ Azure

Shared Storage

Shared file system for all nodes (Lustre, GPFS, etc)

Project storage (Amazon S3/ Azure storage)

Worker File I/O

Handled by Shared file system

needs to be transferred to and from project storage my commands on worker

Key Players with an HPC

With an HPC, there is a collection of specialized hardware, including mainframe computers, as well as a distributed processing software framework so that the incredibly large computer system can handle massive amounts of data and processing at high speeds.
The goal of an HPC is to have the files on the hardware and to also do the analysis on it. In this way, it is similar to a local computer, but with more specialty hardware and software to have more data and processing power.

Your computer: this communicates with the HPC cluster for resources
HPC Cluster
- Shared Storage: common area for where files are stored. You may have directories branching out by users or in another format
- Head Node: manages the workers and the shared storage
- HPC Worker: is where we do our computation and is part of the HPC cluster.
These work together to increase processing power and to have jobs and queues so that when the amount of workers that are needed are available, the jobs can run.

Key Players in Cloud Computing

In comparison, cloud computing adds layers into analysis to increase computational power and storage.
This relationship and the layers involved are in the figure below:
Let's contrast this with processing a file on the DNAnexus platform.
- We'll start with our computer, the DNAnexus platform, and a file from project storage.
- We first use the dx run command, requesting to run an app on a file in project storage. This request is then sent to the platform, and an appropriate worker from the pool of workers is made available.
- When the worker is available, we can transfer a file from the project to the worker.
- The platform handles installing the app and its software environment to the worker as well.
- Once our app is ready and our file is set, we can run the computation on the worker.
- Any files that we generate must be transferred back into project storage.

Key Differences

HPC jobs are limited by how many workers are physically present on the HPC.
Traditionally, cloud computing has better architecture than an HPC, so the jobs are more efficient.

Transferring Files

One common barrier is getting our files onto the worker from project storage, and then doing computations with them on the worker. The last barrier we'll review is getting the file outputs we've generated from the worker back into the project storage.
Cloud computing has a nestedness to it and transferring files can make learning it difficult.
A mental model of how cloud computing works can help us overcome these barriers.

Resolution:

Cloud computing is indirect, and you need to think 2 steps ahead.
Here is the visual for thinking about the steps for file management:

Running apps

Creating apps and running them is covered later in the documentation.

Apps serve to (at minimum):

Request an EC2/Azure worker
Configure the worker's environment
Establish data transfer

Why do this with DNAnexus?

Highly secure platform with built-in compliance infrastructure
Fully configurable platform
- User can run single scripts to fully-automated, production-level workflows
Data transfer designed to be fast and efficient
- Read and analyze massive files directly using dxfuse
Instances are configured for you via apps
- Variety of ways to configure your own environments
Access to the wealth of AWS/Azure resources
- Largest Azure instances: ~4Tb RAM
- Largest AWS instances: ~2Tb RAM

Equivalent Commands

Task

dx-toolkit

PBS

SLURM

Run Job

dx run <app-id> <script>

qsub <script>

sbatch <script>

Monitor Job

dx find jobs

qstat

squeue

Kill Job

dx terminate <jobid>

qdel <jobid>

scancel <jobid>

Practical Approaches

Single Job
- Use `dx run` on the CLI directly
- Use `dx run` in a shell script
Batch Processing
- Use a shell script to use `dx run` on multiple files
- Use dxFUSE to directly access files (read only)
- dx generate-batch-inputs/ dx run --batch-tsv

Batch Processing Comparisons

Component

HPC Recipe

Cloud Recipe

List Files

Request 1 worker/ file

Use loop for each file: 1) use dx run, 2) transfer file, and 3) run commands

use array ids to process 1 file/worker

submit job to head node

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.

PreviousCloud Computing for Scientists NextOverview of the Platform

Last updated 10 months ago

Was this helpful?