Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Basic Concept and Terminology
  • Key Players in Understanding Cloud Computing
  • Specific Terms Outside of Key Players
  • Project Storage vs Workers
  • Local vs Cloud Analysis
  • Local Machines
  • Cloud Computing
  • Key Differences
  • Common Challenges with Cloud Computing
  • Challenge 1: Requesting Enough Resources
  • Challenge 2: Installing Dependencies
  • Resolution for Challenge 1 and 2:
  • Challenge 3: Transferring Files
  • Resolution for Challenge 3:
  • Solution for Challenges: Apps
  • Resources

Was this helpful?

Export as PDF
  1. Cloud Computing

Cloud Computing for Scientists

PreviousGeneral InformationNextCloud Computing for HPC Users

Last updated 9 months ago

Was this helpful?

Basic Concept and Terminology

Key Players in Understanding Cloud Computing

  • Your Computer: When we utilize cloud resources, we as users request them from our own computer using commands from the dx toolkit.

  • DNAnexus platform: The platform has many working pieces, but we can treat it as one entity here. Our request gets sent to the platform, and given availability, it will grant access to a temporary DNAnexus Worker.

  • DNAnexus Worker: This temporary worker is the third key player and is where we do our computation on. We'll see that it starts out as a blank slate.

Specific Terms Outside of Key Players

  • A project contains files and executables and logs associated with analysis securely stored on the platform

  • The executables on the platform are referred to as apps. Apps are executables that can be run on the DNAnexus platform. Most importantly, they need to contain a software environment to run the executable.

  • A software environment in general is everything needed to run software on a brand new computer. This includes the software itself that you are needing as well as any dependencies that are needed to run the software. Some examples of dependencies are languages (such as R) that are needed to execute the software.

Project Storage vs Workers

Project storage is permanent, but the workers are temporary. This means that you have to relay information back and forth as shown in the figure below.

The key concept with cloud computing: project storage can be considered as permanent on the platform. Note that workers are temporary. Because workers are temporary, we need to transfer the files we want to process to them. When we are done, we need to transfer any output files back to the project storage. If we don't do this, the files will be lost when we lose access to the worker.

Local vs Cloud Analysis

Local Machines

  • On your local computer, everything is on your machine.

    • This includes your data and the scripts, as well as your software environment and dependencies are also downloaded.

    • The results and in between steps are also generated and saved on your machine as well.

  • You own it and you control it.

  • This is great, but limited by how much storage and computational power that you have on your local machine.

  • This is highlighted in the figure below:

Cloud Computing

  • In comparison, cloud computing adds layers into analysis to increase computational power and storage.

  • This relationship and the layers involved are in the figure below:

  • Let's contrast this with the process of processing a file on the DNAnexus platform.

    • We'll start with our computer, the DNAnexus platform, and a file from project storage.

    • We first start out by using the dx run command, requesting to run an app on a file in project storage. This request is then sent to the platform, and an appropriate worker from the pool of workers is made available.

    • When the worker is available, we can transfer a file from the project to the worker.

    • The platform handles installing the app and its software environment to the worker as well.

    • Once our app is ready and our file is set, we can run the computation on the worker.

    • Any files that we generate must be transferred back into project storage.

Key Differences

  • The first difference is that we need to request a worker and we only have temporary access to it. We need to bring everything to the worker, including the software environment.

  • The second key difference is that we need to bring our files and scripts from project storage to the worker.

Common Challenges with Cloud Computing

Challenge 1: Requesting Enough Resources

  • Our first barrier is requesting an appropriate worker that can do our computational job.

  • For example, our app may require more memory, or if it is optimized for working on multiple CPUs, more CPUs.

  • We need to understand how big our files are and the computing requirements of our software to do this.

Challenge 2: Installing Dependencies

  • Our second barrier is installing the software environment on the worker, such as R.

  • Because we are starting from scratch on a worker, we will need ways to reproducibly install the software environment on the worker.

  • We'll see that this is one of the roles of Apps. As part of their job, they will install the appropriate software environment.

Resolution for Challenge 1 and 2:

  • There is some good news. If we are running apps, they will handle both of these barriers.

  • Number one, all apps have a default instance type to use. We'll see that we can tailor this.

  • Secondly, Apps install the required software environment on their workers.

Challenge 3: Transferring Files

  • Our third barrier is getting our files onto the worker from project storage, and then doing computations with them on the worker. The last barrier we'll talk about is getting the file outputs we've generated from the worker back into the project storage.

  • Cloud computing has a nestedness to it and transferring files back and forth can make learning it difficult.

  • Having a mental model of how cloud computing works can help us overcome these barriers.

Resolution for Challenge 3:

  • Cloud computing is indirect, and you need to think 2 steps ahead.

  • Here is the visual for thinking about the steps for file management:

Solution for Challenges: Apps

  • Apps help you address installing software on worker

  • Prebuilt software environment that is installed onto the temporary worker

  • Can build our own apps

  • Apps serve to (at minimum):

    1. Request a worker (Challenge 1)

    2. Configure the worker's environment (Challenge 2)

    3. Establish data transfer (Challenge 3)

  • Running apps are covered throughout the rest of the documentation.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Full Documentation