Cloud Workstation
The cloud_workstation
app provides a Linux (Ubuntu) terminal running in the cloud, which is the same base execution environment for all DNAnexus apps. This is used most often for testing application code and building Docker images. I especially favor the cloud workstation whenever I need to work with large data files that I don't wish to copy to my local disk (laptop) as the transfer speeds are internal to AWS rather than over the open internet. If you have previously been limited to HPC environments where sysadmins determine what software may or may not be installed, you will find that you have sudo
privileges to install any software you like, via apt
, downloading pre-built binaries, or building from source code.
In order to run cloud workstation, you will need to set up a ssh key pair. You can do this by running the following command
Here is the start of the usage for the app:
As noted in the following usage, the default timeout is one hour, but can be changed if you need to.
In the preceding command, I also use the following flags from dx run
:
-imax_session_length="2h": changes the max session length to 2 hours
-y|--yes
: Do not ask for confirmation before launching job--ssh
: Configure the job to allow SSH access and connect to it after launching. Defaults--priority
to high.
This is actually an argument to dx run
, not the cloud workstation app. You can use this argument with any app to override the default instance chosen by the app developer.
The app produces no outputs. In the following sections, I want to focus on the inputs.
Maximum Session Length
As noted in the following usage, the default timeout is one hour.
You can set the usage to a different length by doing the following command, which sets the limit for 2 hours:
When on the workstation, you can find how much time is left using dx-get-timeout
:
If you would like to extend the time left, use dx-set-timeout
with the same values shown previously for session length. For example, you can set the timeout back to 2 hours and verify that you now have 2 hours left:
Input Files
You can initiate the app with any files you want copied to the instance:
One of the main use cases for the cloud workstation is working with large files, and I will mostly use dx download
on the instance to download what I want. An especially important case is when I want to download a file to STDOUT rather than to a local file, in which case I would not want to initiate the app using this input. For example, when dealing with a tarball of an entire Illumina BCL run directory, I would prefer to download to STDOUT and pipe this into tar
:
The alternative would require at least twice the disk space (to download the tarball and then expand the contents).
Snapshot
You can save the state of a workstation---called a "snapshot"---and start a new workstation using that saved state:
For instance, you may go through a lengthy build of various packages to create the environment you need to run some application that will be lost when the workstation stops.
To demonstrate, I will show that the Python module "pandas" is not installed by default:
I use python3 -m pip install pandas
to install the module, then dx-create-snapshot
to save the state of the machine, which shows:
I can use the file ID of the snapshot to reconstitute my environment:
Now I find that "pandas" does exist on the image:
You can use a snapshot file ID as an asset for native applets.
Instance Type
This is actually an argument to dx run
, not the cloud workstation app. You can use this argument with any app to override the default instance chosen by the app developer.
Running Cloud Workstation
When the app secures an instance, you will be greeted by the following messages. The first shows the job ID, instance type, project ID, and the workspace container:
This means that pressing Ctrl-A
to jump to the beginning of the line in the terminal will trigger the following Byobu configuration screen where you are prompted to choose whether to use Screen or Emacs mode:
Ctrl-A, N
: Next windowCtrl-A, C
: Create windowCtrl-A, "
: show list of windowsCtrl-A, K
: Kill/delete window
The next message is perhaps the most important:
This means that if you lose your connection to the workstation, the job will still continue running until you manually terminate it or the maximum session length is reached. For instance, you may lose your internet connection or accidentally close your terminal application. Also, your connection will be lost after an extended period of inactivity. To reconnect, use dx find jobs
to find the job ID of the cloud workstation, and then use dx ssh <job-id>
to pick up the Byobu session with all your work and windows in the same state.
Next, the message recommends you press F1 to read more about Byobu and how to switch screens:
Finally, the message reminds you that you have sudo
privileges to install anything you like. The dx-toolkit
is also installed, so you can run all dx
commands:
The preceeding tip to use htop
is especially useful. When developing application code, I will typically choose an instance type I estimate is appropriate to a task. I will download sample input files, install all the required software, run the commands needed for the app, then open a new screen (Ctrl-A, C
) and run htop
there to see resource usage.
This tip is also useful once you learn to build and run apps. You can shell into a running job using dx ssh <job-id>
and connect to Byobu. To see how the system is performing in real time to a given input, you can use Ctrl-A, C
to open a new screen to run htop
.
The cloud workstation comes with several programming languages installed:
bash 5.x
Python 3.x
R 3.x
Perl 5.x
Note that you are not your DNAnexus username on the workstation but rather the dnanexus user:
This is not to be confused with your DNAnexus ID:
Relationship to Parent Project
Like any job, a cloud workstation must be run in the context of a DNAnexus project; however, if I execute dx ls
on the workstation, I will not see the contents of the project. This is because the containing workspace is created for the job, which I can see the "Current workspace" value in dx env
:
I can see more details by searching the workstation's environment for all the variables starting with DX:
The $DX_PROJECT_CONTEXT_ID
variable contains the project ID:
I can run use this variable to see the parent project:
Any files left on the workstation after termination will be permanently destroyed. If I use dx upload
to save my work, it will go into the workspace's container, not the parent project. To resolve this, I use the $DX_PROJECT_CONTEXT_ID
variable to upload some output file to a results folder in the parent project:
Alternatively, I can run remove the DX_WORKSPACE_ID
variable and change directories into the $DX_PROJECT_CONTEXT_ID
:
After the preceeding command, dx ls
and dx upload
will reference the parent project rather than the container workspace.
The ttyd
app runs a similar Linux terminal in the browser. Here are some differences to note:
You will enter as the root user.
Commands like
dx ls
anddx upload
will default to the project not a container workspace.There is no maximum session length, so
ttyd
runs until manually terminated. This can be costly if you forget to shut down the terminal.
Resources
To create a support ticket if there are technical issues:
Go to the Help header (same section where Projects and Tools are) inside the platform
Select "Contact Support"
Fill in the Subject and Message to submit a support ticket.
Last updated
Was this helpful?