Launching a ML JupyterLab Job

ML JupyterLab is an app in the AI/ML Accelerator package. A license is required in order to use the AI/ML Accelerator package. For more information, please contact DNAnexus Sales via [email protected].

ML JupyterLab is essentially a purpose-built JupyterLab instance on DNAnexus Platform. It inherits all capabilities of a standard JupyterLab, plus specialized features for AI/ML development. This section gives you a quick start on how to launch ML JupyterLab.

To Launch with GUI

Find ML JupyterLab in your Tools Library: Simply open your Tools Library and search for “AI/ML Accelerator - ML JupyterLab”. If you cannot find it, you might have to obtain a license.
Set the Required Inputs

To properly start an ML JupyterLab job, either a Cluster/ Docker Image or a Snapshot Image must be specified.
- Snapshot Image
  - Snapshot Image is a tarball file that captures the exact state of a previous ML JupyterLab session at a specific point in time. It includes all installed packages, custom configurations, and any files created or modified during the session.
  - This allows you to pause your work, share your setup with collaborators, or restore the session later without losing any progress or dependencies.
- Docker Image
  - When you launch ML JupyterLab, it will ask you to pick a Docker image from a list. Those are the prebuilt environments tailored to AI/ML development. You can pick the most standard option, which is buildspec-1.0/ray-2.32.0-py310-cpu, or other options if you need a GPU-friendly setup.
  - Currently, we support Ray version 2.32 as a distributed engine.
    Available Docker images:
    General Python 3.10 with CPU Support:
    buildspec-1.0/ray-2.32.0-py310-cpu
    Pytorch with GPU Support:
    buildspec-1.0/ray-2.32.0-py310-gpu-pytorch
    TensorFlow with GPU Support:
    buildspec-1.0/ray-2.32.0-py310-gpu-tensorflow
    Each image is optimized for specific workloads, the included packages and their version in each Docker image are listed in the Pre-installed ML packages section.
Duration:
- This parameter sets the duration (in minutes) for which your environment will remain active. The expected runtime should be specified based on how long you plan to work with the environment, the size of the dataset, or the complexity of the tasks you will be running.
- For example, larger datasets or more complex computations may require a longer runtime.
- If you are unsure about the duration, use the default value and you can change this parameter inside the app later.

Set the Optional Parameters

Instance Type & Initial Instance Count
- This input is crucial if you want to develop AI/ML using large datasets that need intensive computing power. ML JupyterLab has a built-in Ray cluster and this architecture can help create a workspace with a huge number of CPUs and RAM.
- To find this input, click on the instance icon in the top right corner of the input panel which is automatically open when you try to launch ML JupyterLab.
- By default, the ML JupyterLab uses two mem2_ssd1_v2_x4 instances (Input Parameters: Initial Instance Count: 2, Instance Type: mem2_ssd1_v2_x4). As the head node is dedicated for job distribution, this default has only one worker. Therefore, by default, your ML JupyterLab has 15.6 GB of memory and 4 cores.
- You can change the Instance Type or Initial Instance Count to obtain the computing power that you want. For example, to launch an ML JupyterLab with 512 cores, you can set Instance Type to mem4_ssd1_x128 and Initial Instance Count to 5.
- This setting helps create computing-intensive environments that are impossible to achieve with a single node.
- Note: If you are working with GPU instance types, avoid using mem1_ssd1_gpu2_x8 and mem1_ssd1_gpu2_x32.
Additional Requirements
- This is an optional input. This input requires a text file containing a list of libraries and packages that you want to install in your environment. These libraries are additional to the one that are already provided.
- The file should be formatted as a plain text document, with each package listed on a new line. Each line can specify a package name and optionally its version.
- For example:
  - numpy==1.21.0 pandas>=1.3.0 scikit-learn
- You can customize this list based on the specific packages you need for your project. The system will automatically resolve dependencies when installing these libraries. This format also follows the PIP v24.2 standard.
Wheel Files to be Installed
- This is an optional input. This input allows you to specify an array of wheel files (.whl) that need to be installed as part of the setup for your JupyterLab job.
- Wheel files are pre-built Python package distributions, enabling faster and more reliable installations compared to source distributions.

Opening the Worker URL: Once your ML JupyterLab is launched, you will be redirected to the Monitor screen. From there, click on the Open button.

Use the Open button in the Worker URL to use ML JupyterLab
Even when the Job State is “Running”, it might take a few more minutes for the Platform to set up ML JupyterLab. When the job is not ready, you will see the below screen. In such cases, simply reload your browser after a few minutes.

The waiting screen of ML JupyterLab when the instance is not ready

To Launch with CLI

You can also launch your job with dxtoolkit:

dx run app-ml_jupyterlab_ray_cluster \ -icluster_image='buildspec-1.0/ray-2.32.0-py39-cpu' \ --name='My first ML-JupyterLab'

Once the Job State is at Running, you can get the Worker URL with:

dx describe job-xxxx --json | jq -r .httpsApp.dns.url

Resources

Full Documentation

To create a support ticket if there are technical issues:

Go to the Help header (same section where Projects and Tools are) inside the platform
Select “Contact Support”
Fill in the Subject and Message to submit a support ticket.

PreviousIntroduction to ML JupyterLab NextIn App Features

Last updated 29 days ago

Was this helpful?