Introduction to JupyterLab

New to JupyterLab?

If you have never used a JupyterLab notebook before, please view this information:

Introduction

We can interact with the platform in several different ways and install software packages in these different environments depending on what we are wanting to use and how we want to use it. As shown in the diagram below, we will be explaining Jupyter Lab Python/R/Stata and Spark JupyterLab Python/R:

Why JupyterLab?

Data Scientists’ tasks can be interactive. Options for interactive analysis in JupyterLab are:

  • Notebook-based Analysis

  • Exploratory Data Analysis (EDA)

  • Data Preprocessing/ Cleaning

  • Implementing New Machine Learning(ML)/ Model

  • Building Workflows

Requesting an Instance

Use Single DXJupyter Instance if:

  • The work can be done on a single machine instance

  • Main Use Cases:

  • Python/R

  • Image Processing

  • ML

  • Stata

Use Spark Cluster DXJupyter If:

  • Working with very large datasets that will not fit in memory on a single instance

  • Using the Cohort Browser and querying a large ingested dataset

  • Needing to use Spark based tools such as dxdata, HAIL or GLOW

Starting a JupyterLab Job

  1. Select JupyterLab with Python, R, Stata, ML, Image Processing or JupyterLab from Spark from the Tool Library, or select “Start Analysis” from the project space and select JupyterLab from the tool list. Once selected, press “Run Selected”

  1. Select the output location, and change the job name if desired.

  1. Then, select the inputs you intend on using

    1. Snapshot file (not required, and how to create a snapshot is in the Utilizing Snapshot section)

    2. Input files (not required, can do in the notebook analysis)

    3. Stata settings file (license required for Stata)

    4. Update the Duration if desired

    5. Add Commands to run in the JupyterLab environment (optional)

    6. Finally, update the Feature. For a full list of packages in each feature, please look in the Preinstalled Packages List. The options are

      • Python_R

      • ML

      • IMAGE_PROCESSING

      • STATA

      • MONAI_ML

  1. Then, press “Start Analysis” in the far right corner

  1. Next, confirm the following parameters:

    1. Job Name

    2. Output Folder

    3. Priority (defaults to normal, can be set to high)

    4. Spending Limit (optional)

    5. Instance Type (change the default value if needed)

  1. Then, press “Launch Analysis”

  2. When redirected to the monitor tab, select the job name

  3. It will redirect you to the details of the JupyterLab job. Wait for the job to start running, and for the worker URL to appear

  4. Press “Open Worker URL” and the JupyterLab home page will appear

  1. Note: Sometimes, the job is still initializing, so if you press Open Worker URL immediately, it may show a 502 error message. This is okay, and the job will update when the job is finished initializing.

Running instances may take several minutes to load as the allocations become available.

Last updated

Was this helpful?