Academy Documentation
  • Usage of Academy Documentation
  • Getting Started
    • Background Information
    • For Apollo Users
    • For Titan Users
    • For Scientists
    • For HPC Users
    • For Experienced Users
  • Cloud Computing
    • General Information
    • Cloud Computing for Scientists
    • Cloud Computing for HPC Users
  • Overview of the Platform
    • Overview of the Platform User Interface
    • Tool Library and App Introduction
  • Billing Access and Orgs
    • Orgs and Account Management
    • Billing and Pricing
  • Cohort Browser
    • Apollo Introduction
    • Overview of the Cohort Browser
    • Combining Cohorts
    • Genomic Variant Browser
    • Somatic Variants
  • JSON
    • Introduction
    • JSON on the Platform
  • Command Line Interface (CLI)
    • Introduction to CLI
    • Advanced CLI
  • Building Applets
    • Introduction
    • Bash
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: samtools
      • Example 4: cnvkit
      • Example 5: samtools with a Docker Image
    • Python
      • Example 1: Word Count (wc)
      • Example 2: fastq_quality_trimmer
      • Example 3: cnvkit
    • Publishing Applets to Apps
  • Building Workflows
    • Native Workflows
    • WDL
      • Example 1: hello
      • Example 2: Word Count (wc)
      • Example 3: fastq_trimmer
      • Example 4: cnvkit
      • Example 5: workflow
    • Nextflow
      • Resources To Learn Nextflow
      • Overview of Nextflow
      • Nextflow Setup
      • Importing Nf-Core
      • Building Nextflow Applets
      • Error Strategies for Nextflow
      • Job Failures
      • Useful Information
  • Interactive Cloud Computing
    • Cloud Workstation
    • TTYD
    • TTYD vs Cloud Workstation
    • JupyterLab
      • Introduction
      • Running a JupyterLab Notebook
  • Docker
    • Using Docker
    • Creating Docker Snapshots
    • Running Docker with Swiss Army Knife
  • Portals
    • Overview of JSON files for Portals
    • Branding JSON File
    • Home JSON File
    • Navigation JSON File
    • Updating Your Portal
  • AI/ ML Accelerator
    • Data Profiler
      • Introduction to Data Profiler
      • Utilizing Data Profiler Navigator
      • Dataset Level Screen
      • Table Level Screen
      • Column Level Screen
      • Explorer Mode
      • Accessing Data Profiler in ML JupyterLab
    • ML JupyterLab
      • Introduction to ML JupyterLab
      • Launching a ML JupyterLab Job
      • In App Features
      • Getting Started with ML JupyterLab
    • MLflow
      • Introduction to MLflow
      • Getting Started with MLflow
      • Using MLflow Tracking Server
      • Model Registry
      • Using Existing Model
      • Utilizing MLflow in JupyterLab
Powered by GitBook
On this page
  • Maximum Session Length
  • Input Files
  • Snapshot
  • Instance Type
  • Running Cloud Workstation
  • Relationship to Parent Project
  • Resources

Was this helpful?

Export as PDF
  1. Interactive Cloud Computing

Cloud Workstation

The cloud_workstation app provides a Linux (Ubuntu) terminal running in the cloud, which is the same base execution environment for all DNAnexus apps. This is used most often for testing application code and building Docker images. I especially favor the cloud workstation whenever I need to work with large data files that I don't wish to copy to my local disk (laptop) as the transfer speeds are internal to AWS rather than over the open internet. If you have previously been limited to HPC environments where sysadmins determine what software may or may not be installed, you will find that you have sudo privileges to install any software you like, via apt, downloading pre-built binaries, or building from source code.

In order to run cloud workstation, you will need to set up a ssh key pair. You can do this by running the following command

dx ssh_config

Here is the start of the usage for the app:

$ dx run cloud_workstation -h
usage: dx run cloud_workstation [-iINPUT_NAME=VALUE ...]

App: Cloud Workstation

Version: 2.2.1 (published)

This app sets up a cloud workstation which you can access by running the
applet with the --ssh or --allow-ssh flags

See the app page for more information:
  https://platform.dnanexus.com/app/cloud_workstation

As noted in the following usage, the default timeout is one hour, but can be changed if you need to.

Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y):
      [-imax_session_length=(string, default="1h")]
      The maximum length of time to keep the workstation running.
      Value should include units of either s, m, h, d, w, M, y for
      seconds, minutes, hours, days, weeks, months, or years
      respectively.
$ dx run -imax_session_length="2h" app-cloud_workstation --ssh -y

In the preceding command, I also use the following flags from dx run:

  • -imax_session_length="2h": changes the max session length to 2 hours

  • -y|--yes: Do not ask for confirmation before launching job

  • --ssh: Configure the job to allow SSH access and connect to it after launching. Defaults --priority to high.

$ dx run app-cloud_workstation --instance-type mem1_ssd2_v2_x72 --ssh -y

This is actually an argument to dx run, not the cloud workstation app. You can use this argument with any app to override the default instance chosen by the app developer.

The app produces no outputs. In the following sections, I want to focus on the inputs.

Maximum Session Length

As noted in the following usage, the default timeout is one hour.

Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y):
      [-imax_session_length=(string, default="1h")]
      The maximum length of time to keep the workstation running.
      Value should include units of either s, m, h, d, w, M, y for
      seconds, minutes, hours, days, weeks, months, or years
      respectively.

You can set the usage to a different length by doing the following command, which sets the limit for 2 hours:

$ dx run -imax_session_length="2h" app-cloud_workstation --ssh -y

When on the workstation, you can find how much time is left using dx-get-timeout:

dnanexus@job-GXfvYxj071x5P87Fxx6f5k47:~$ dx-get-timeout
0 days 1 hours 42 minutes 50 seconds

If you would like to extend the time left, use dx-set-timeout with the same values shown previously for session length. For example, you can set the timeout back to 2 hours and verify that you now have 2 hours left:

dnanexus@job-GXfvYxj071x5P87Fxx6f5k47:~$ dx-set-timeout 1d
dnanexus@job-GXfvYxj071x5P87Fxx6f5k47:~$ dx-get-timeout
0 days 1 hours 59 minutes 57 seconds

Input Files

You can initiate the app with any files you want copied to the instance:

Files: [-ifids=(file) [-ifids=... [...]]]
      An optional list of files to download to the cloud workstation
      on startup.

One of the main use cases for the cloud workstation is working with large files, and I will mostly use dx download on the instance to download what I want. An especially important case is when I want to download a file to STDOUT rather than to a local file, in which case I would not want to initiate the app using this input. For example, when dealing with a tarball of an entire Illumina BCL run directory, I would prefer to download to STDOUT and pipe this into tar:

$ dx download file-XXXX -o - | tar xv

The alternative would require at least twice the disk space (to download the tarball and then expand the contents).

Snapshot

You can save the state of a workstation---called a "snapshot"---and start a new workstation using that saved state:

Snapshot: [-isnapshot=(file)]
      An optional snapshot file to restore the workstation environment.

For instance, you may go through a lengthy build of various packages to create the environment you need to run some application that will be lost when the workstation stops.

To demonstrate, I will show that the Python module "pandas" is not installed by default:

dnanexus@job-GXfvYxj071x5P87Fxx6f5k47:~$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'

I use python3 -m pip install pandas to install the module, then dx-create-snapshot to save the state of the machine, which shows:

Created snapshot: project-GXY0PK0071xJpG156BFyXpJF:July_11_2023_23_54.snapshot
(file-GXfygVj071xGjVfg1KQ9B7PP)

I can use the file ID of the snapshot to reconstitute my environment:

$ dx run app-cloud_workstation -isnapshot=file-GXfygVj071xGjVfg1KQ9B7PP -y --ssh

Now I find that "pandas" does exist on the image:

dnanexus@job-GXfyj58071xB4VJ9X0yk75k3:~$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> help(pd.read_csv)

You can use a snapshot file ID as an asset for native applets.

Instance Type

$ dx run app-cloud_workstation --instance-type mem1_ssd2_v2_x72 --ssh -y

This is actually an argument to dx run, not the cloud workstation app. You can use this argument with any app to override the default instance chosen by the app developer.

Running Cloud Workstation

When the app secures an instance, you will be greeted by the following messages. The first shows the job ID, instance type, project ID, and the workspace container:

Welcome to DNAnexus!

This is the DNAnexus Execution Environment, running job-GXfvYxj071x5P87Fxx6f5k47.
Job: Cloud Workstation
App: cloud_workstation:main
Instance type: mem1_ssd1_v2_x8
Project: kyclark_test (project-GXY0PK0071xJpG156BFyXpJF)
Workspace: container-GXfvYyj0p4QgFgP4zZyBFv7Y
Running since: Tue Jul 11 21:31:40 UTC 2023
Running for: 0:01:37
The public address of this instance is ec2-3-90-239-144.compute-1.amazonaws.com.
You are running byobu, a terminal session manager.

This means that pressing Ctrl-A to jump to the beginning of the line in the terminal will trigger the following Byobu configuration screen where you are prompted to choose whether to use Screen or Emacs mode:

Configure Byobu's ctrl-a behavior...

When you press ctrl-a in Byobu, do you want it to operate in:
    (1) Screen mode (GNU Screen's default escape sequence)
    (2) Emacs mode  (go to beginning of line)

Note that:
  - F12 also operates as an escape in Byobu
  - You can press F9 and choose your escape character
  - You can run 'byobu-ctrl-a' at any time to change your selection

Select [1 or 2]:
  • Ctrl-A, N: Next window

  • Ctrl-A, C: Create window

  • Ctrl-A, ": show list of windows

  • Ctrl-A, K: Kill/delete window

The next message is perhaps the most important:

If you get disconnected from this instance, you can log in again;
your work will be saved as long as the job is running.

This means that if you lose your connection to the workstation, the job will still continue running until you manually terminate it or the maximum session length is reached. For instance, you may lose your internet connection or accidentally close your terminal application. Also, your connection will be lost after an extended period of inactivity. To reconnect, use dx find jobs to find the job ID of the cloud workstation, and then use dx ssh <job-id> to pick up the Byobu session with all your work and windows in the same state.

Next, the message recommends you press F1 to read more about Byobu and how to switch screens:

For more information on byobu, press F1.
The job is running in terminal 1. To switch to it, use the F4 key
(fn+F4 on Macs; press F4 again to switch back to this terminal).

Finally, the message reminds you that you have sudo privileges to install anything you like. The dx-toolkit is also installed, so you can run all dx commands:

Use sudo to run administrative commands.
From this window, you can:
 - Use the DNAnexus API with dx
 - Monitor processes on the worker with htop
 - Install packages with apt-get install or pip3 install
 - Use this instance as a general-purpose Linux workstation
OS version: Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1031-aws x86_64)

The preceeding tip to use htop is especially useful. When developing application code, I will typically choose an instance type I estimate is appropriate to a task. I will download sample input files, install all the required software, run the commands needed for the app, then open a new screen (Ctrl-A, C) and run htop there to see resource usage.

This tip is also useful once you learn to build and run apps. You can shell into a running job using dx ssh <job-id> and connect to Byobu. To see how the system is performing in real time to a given input, you can use Ctrl-A, C to open a new screen to run htop.

The cloud workstation comes with several programming languages installed:

  • bash 5.x

  • Python 3.x

  • R 3.x

  • Perl 5.x

Note that you are not your DNAnexus username on the workstation but rather the dnanexus user:

$ whoami
dnanexus

This is not to be confused with your DNAnexus ID:

$ dx whoami
kyclark

Relationship to Parent Project

Like any job, a cloud workstation must be run in the context of a DNAnexus project; however, if I execute dx ls on the workstation, I will not see the contents of the project. This is because the containing workspace is created for the job, which I can see the "Current workspace" value in dx env:

$ dx env
Auth token used         4Gv26bY2YJ6gJjxGkV6Qg62B51X1VF7kq3gPZp2V
API server protocol     http
API server host         10.0.3.1
API server port         8124
Current workspace       container-GXfvYyj0p4QgFgP4zZyBFv7Y
Current folder          None
Current user            None

I can see more details by searching the workstation's environment for all the variables starting with DX:

$ env | grep DX
DX_APISERVER_PROTOCOL=http
DX_JOB_ID=job-GXfvYxj071x5P87Fxx6f5k47
DX_APISERVER_HOST=10.0.3.1
DX_WATCH_PORT=8090
DX_WORKSPACE_ID=container-GXfvYyj0p4QgFgP4zZyBFv7Y
DX_PROJECT_CACHE_ID=container-GXfvYxj071x5P87Fxx6f5k48
DX_SNAPSHOT_FILE=null
DX_SECURITY_CONTEXT={"auth_token_type": "Bearer", "auth_token": "4Gv26bY2YJ6gJjxGkV6Qg62B51X1VF7kq3gPZp2V"}
DX_RESOURCES_ID=container-GKyz0G00FY38jv564gjXxb46
DX_THRIFT_URI=query.us-east-1.apollo.dnanexus.com:10000
DX_APISERVER_PORT=8124
DX_DXDA_DOWNLOAD_URI=http://10.0.3.1:8090/F/D2PRJ/
DX_PROJECT_CONTEXT_ID=project-GXY0PK0071xJpG156BFyXpJF
DX_RUN_DETACH=1

The $DX_PROJECT_CONTEXT_ID variable contains the project ID:

$ echo $DX_PROJECT_CONTEXT_ID
project-GXY0PK0071xJpG156BFyXpJF

I can run use this variable to see the parent project:

$ dx ls $DX_PROJECT_CONTEXT_ID:/

Any files left on the workstation after termination will be permanently destroyed. If I use dx upload to save my work, it will go into the workspace's container, not the parent project. To resolve this, I use the $DX_PROJECT_CONTEXT_ID variable to upload some output file to a results folder in the parent project:

$ dx upload output.txt --path $DX_PROJECT_CONTEXT_ID:/results

Alternatively, I can run remove the DX_WORKSPACE_ID variable and change directories into the $DX_PROJECT_CONTEXT_ID:

$ unset DX_WORKSPACE_ID && dx cd $DX_PROJECT_CONTEXT_ID

After the preceeding command, dx ls and dx upload will reference the parent project rather than the container workspace.

The ttyd app runs a similar Linux terminal in the browser. Here are some differences to note:

  • You will enter as the root user.

  • Commands like dx ls and dx upload will default to the project not a container workspace.

  • There is no maximum session length, so ttyd runs until manually terminated. This can be costly if you forget to shut down the terminal.

Resources

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

PreviousInteractive Cloud ComputingNextTTYD

Last updated 9 months ago

Was this helpful?

By default, this app will choose an 8-core in instance type such as "mem1_ssd1_v2_x8" (16G RAM, 200G disk) for AWS:us-east-1. This is usually adequate for my needs, but if I need more memory or disk space, I can specify any valid the --instance-type argument:

By default, this app will choose an 8-core in instance type such as "mem1_ssd1_v2_x8" (16G RAM, 200G disk) for AWS:us-east-1. This is usually adequate for my needs, but if I need more memory or disk space, I can specify any valid the --instance-type argument:

The next part explains that you are running the terminal multiplexer:

If you choose Screen mode, then Byobu will emulate keystrokes, such as:

DNAnexus instance type
DNAnexus instance type
Byobu
GNU Screen
Full Documentation