Cloud Computing for HPC Users
Last updated
Was this helpful?
Last updated
Was this helpful?
Driver/ Requestor
Head Node of Cluster
API Server
Submission Script Language
Portable Bash System (PBS) or SLURM
dx-toolkit
Worker
Requested from pool of machines in private cluster
requested from pool of machines in AWS/ Azure
Shared Storage
Shared file system for all nodes (Lustre, GPFS, etc)
Project storage (Amazon S3/ Azure storage)
Worker File I/O
Handled by Shared file system
needs to be transferred to and from project storage my commands on worker
With an HPC, there is a collection of specialized hardware, including mainframe computers, as well as a distributed processing software framework so that the incredibly large computer system can handle massive amounts of data and processing at high speeds.
The goal of an HPC is to have the files on the hardware and to also do the analysis on it. In this way, it is similar to a local computer, but with more specialty hardware and software to have more data and processing power.
Your computer: this communicates with the HPC cluster for resources
HPC Cluster
Shared Storage: common area for where files are stored. You may have directories branching out by users or in another format
Head Node: manages the workers and the shared storage
HPC Worker: is where we do our computation and is part of the HPC cluster.
These work together to increase processing power and to have jobs and queues so that when the amount of workers that are needed are available, the jobs can run.
In comparison, cloud computing adds layers into analysis to increase computational power and storage.
This relationship and the layers involved are in the figure below:
Let's contrast this with processing a file on the DNAnexus platform.
We'll start with our computer, the DNAnexus platform, and a file from project storage.
We first use the dx run command, requesting to run an app on a file in project storage. This request is then sent to the platform, and an appropriate worker from the pool of workers is made available.
When the worker is available, we can transfer a file from the project to the worker.
The platform handles installing the app and its software environment to the worker as well.
Once our app is ready and our file is set, we can run the computation on the worker.
Any files that we generate must be transferred back into project storage.
HPC jobs are limited by how many workers are physically present on the HPC.
Traditionally, cloud computing has better architecture than an HPC, so the jobs are more efficient.
One common barrier is getting our files onto the worker from project storage, and then doing computations with them on the worker. The last barrier we'll review is getting the file outputs we've generated from the worker back into the project storage.
Cloud computing has a nestedness to it and transferring files can make learning it difficult.
A mental model of how cloud computing works can help us overcome these barriers.
Cloud computing is indirect, and you need to think 2 steps ahead.
Here is the visual for thinking about the steps for file management:
Creating apps and running them is covered later in the documentation.
Apps serve to (at minimum):
Request an EC2/Azure worker
Configure the worker's environment
Establish data transfer
Highly secure platform with built-in compliance infrastructure
Fully configurable platform
User can run single scripts to fully-automated, production-level workflows
Data transfer designed to be fast and efficient
Read and analyze massive files directly using dxfuse
Instances are configured for you via apps
Variety of ways to configure your own environments
Largest Azure instances: ~4Tb RAM
Largest AWS instances: ~2Tb RAM
Run Job
dx run <app-id> <script>
qsub <script>
sbatch <script>
Monitor Job
dx find jobs
qstat
squeue
Kill Job
dx terminate <jobid>
qdel <jobid>
scancel <jobid>
Single Job
Use `dx run` on the CLI directly
Use `dx run` in a shell script
Use a shell script to use `dx run` on multiple files
Use dxFUSE to directly access files (read only)
1
List Files
List Files
2
Request 1 worker/ file
Use loop for each file: 1) use dx run, 2) transfer file, and 3) run commands
3
use array ids to process 1 file/worker
4
submit job to head node
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Access to the wealth of
/ dx run --batch-tsv