Introduction to JupyterLab
New to JupyterLab?
If you have never used a JupyterLab notebook before, please view this information:
Introduction
We can interact with the platform in several different ways and install software packages in these different environments depending on what we are wanting to use and how we want to use it. As shown in the diagram below, we will be explaining Jupyter Lab Python/R/Stata and Spark JupyterLab Python/R:

Why JupyterLab?
Data Scientists’ tasks can be interactive. Options for interactive analysis in JupyterLab are:
Notebook-based Analysis
Exploratory Data Analysis (EDA)
Data Preprocessing/ Cleaning
Implementing New Machine Learning(ML)/ Model
Building Workflows
Requesting an Instance
Use Single DXJupyter Instance if:
The work can be done on a single machine instance
Main Use Cases:
Python/R
Image Processing
ML
Stata
Use Spark Cluster DXJupyter If:
Working with very large datasets that will not fit in memory on a single instance
Using the Cohort Browser and querying a large ingested dataset
Needing to use Spark based tools such as dxdata, HAIL or GLOW
Starting a JupyterLab Job
Select JupyterLab with Python, R, Stata, ML, Image Processing or JupyterLab from Spark from the Tool Library, or select “Start Analysis” from the project space and select JupyterLab from the tool list. Once selected, press “Run Selected”

Select the output location, and change the job name if desired.

Then, select the inputs you intend on using
Snapshot file (not required, and how to create a snapshot is in the Utilizing Snapshot section)
Input files (not required, can do in the notebook analysis)
Stata settings file (license required for Stata)
Update the Duration if desired
Add Commands to run in the JupyterLab environment (optional)
Finally, update the Feature. For a full list of packages in each feature, please look in the Preinstalled Packages List. The options are
Python_R
ML
IMAGE_PROCESSING
STATA
MONAI_ML

Then, press “Start Analysis” in the far right corner

Next, confirm the following parameters:
Job Name
Output Folder
Priority (defaults to normal, can be set to high)
Spending Limit (optional)
Instance Type (change the default value if needed)

Then, press “Launch Analysis”
When redirected to the monitor tab, select the job name
It will redirect you to the details of the JupyterLab job. Wait for the job to start running, and for the worker URL to appear
Press “Open Worker URL” and the JupyterLab home page will appear

Note: Sometimes, the job is still initializing, so if you press Open Worker URL immediately, it may show a 502 error message. This is okay, and the job will update when the job is finished initializing.
Running instances may take several minutes to load as the allocations become available.
Last updated
Was this helpful?