Running a JupyterLab Notebook

Before you begin, review the overview documentation and log onto the DNAnexus Platform

Best Practices

  1. Use a DNAnexus JupyerLab notebook so that they will save onto the platform easily.

    1. When you open your JupyterLab session, you will select this option to start a DNAnexus JupyterLab notebook

    2. You can do that by selecting these 2 different options:

  1. Remember to save your notebooks and anything you want to export out of the notebook.

  2. You can access data what is in your notebook space vs what is in your DNAnexus project space by viewing them individually here:

Helpful Tips

  1. All notebooks that are stored in Project will have DX before the name. Example below:

  1. When you are running code blocks, remember that in JupyterLab you can run them out of order. This means that you need to pay attention to the numbers on the side of the code blocks for the order. This is highlighted in gold below:

  1. If you choose to write in python or R primarily, you can use the following at the top of your code block to "switch" to bash scripting. Example below

  1. Notebook locking: only one user can edit at a time from project storage 4. When a user is editing a notebook, it is locked and others cannot edit it.

    1. When a notebook is saved and the kernel is shutdown in JupyterLab, then others can access it.

    2. In order to unlock it, you will need to close the notebook and then use the screen shot below to ensure that you have shutdown the kernel.

Setting Up/ Running a Notebook

  1. Download or Access data files to JupyterLab environment

    1. to Download:

      dx download "PATH"
    2. To access data in JupyterLab environment

      1. this will be read only

      2. do not reflect recent changes in the file system

      3. to add from Project storage add

        data = pd.read_csv("/mnt/project/PATH.csv")

        to the front of your path

  2. Import the data

    import ___ as pd 
    NAME = pd.read_csv("PATH.csv")
  3. Do analysis

    1. This is where you will add the code chunks in that you will need for the rest of your analysis

  4. Upload Results back to Project Space

%%bash 
dx upload FILE --destination users/YOUR_ID/

Opening Notebooks from Project Storage

  • Notebooks can also be directly opened from project storage

  • When you save in JupyterLab, notebook gets uploaded to platform as a new file. This goes back to the concept of immutability

  • Old version of notebook goes into .Notebook_archive/ folder in project

Installing Software Packages

  • Install packages normally with package managers such as pip install (python) or install.packages (R)

  • A number of packages are preinstalled, based on the instance type. List of Preinstalled Packages

Best Practices

  1. Use the correct base image (Spark or Jupyter)

  2. Install all software using a separate Jupyter Notebook

  3. Use version tags when possible

    1. pip install <PACKAGE>==<VERSION>

  4. R: Install from CRAN URL

    1. install.packages(packageurl, repos=NULL, type="source")

  5. Rename and move image from Notebook_Snapshot/ folder

Image Snapshots

To Create a Snapshot:

To Use a Snapshot on a New Notebook

Snapshot Best Practices

  1. Don't save data in your snapshot - it uses storage space and impacts costs.

  2. Snapshots can be large - they use storage space so think twice.

  3. Make sure to rename the snapshot according to your organization's naming conventions: you can remember what they refer to when returning to the project in the future.

Supplemental Information

Running JupyterLabs with Papermill

dx extract dataset

Spark JupyterLab

Resources

DXJupyterLab Reference

Using DXJupyterLab

Full Documentation

To create a support ticket if there are technical issues:

  1. Go to the Help header (same section where Projects and Tools are) inside the platform

  2. Select "Contact Support"

  3. Fill in the Subject and Message to submit a support ticket.

Last updated

Was this helpful?