scvi-tools and CZ CELLxGENE

The user is responsible for reviewing and complying with the license requirements of the software, notebooks, and data referenced in this documentation.

Users are responsible for the costs associated with analyzing the CZ CELLxGENE dataset and scvi-tools and its storage in their project spaces.

Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.

Citations for the scvi-tools and CZ CELLxGENE dataset

For this demonstration, we adapted the Introduction to scvi-tools notebookarrow-up-right, developed by the scvi-tools development team. Users may cite scvi-tools manuscriptarrow-up-right published in 2022 along with the original papers describing each model, which are referred to in the corresponding documentation. In this example, we applied the scVI (single-cell Variational Inference) model; its description is available in the publication Deep generative modeling for single-cell transcriptomicsarrow-up-right.

The scVI model is trained on the human single-cell RNA-seq dataset downloaded from the CZ CELLxGENE data portalarrow-up-right. Cite the publication associated with this dataset: Single-cell resolution characterization of myeloid-derived cell states with implication in cancer outcomearrow-up-right

CZ CELLxGENEarrow-up-right brings together a wide range of public single-cell datasets that have been shared through the Chan Zuckerberg Initiative platform. These datasets are uploaded by the original researchers and distributed under the creative commons CC BY 4.0 license.arrow-up-right More information may be found in the CZ CELLxGENE Data Submission Policyarrow-up-right.

Cite CZ CELLxGENE Discover:arrow-up-right A single-cell data platform for scalable exploration, analysis and modeling of aggregated data CZI Single-Cell Biology, et al. bioRxiv 2023.10.30; doi: https://doi.org/10.1101/2023.10.30.563174arrow-up-right

Overview of scvi-tools

scvi-tools is a software ecosystem designed for fully processing and modeling single-cell omics datasets. The project originates from work carried out in the Yosef Lab at UC Berkeley in collaboration with researchers at the Weizmann Institute of Science. The toolkit can be thought of in two parts:

  • it offers an accessible interface for applying various probabilistic methods to single-cell data (including models like scVI, scANVI, and totalVI), and

  • It provides a framework for constructing new probabilistic approaches using the PyTorch, PyTorch Lightning, and Pyro libraries.

On DNAnexus, we provide a notebook that demonstrates an end-to-end single-cell RNA-seq workflow using scvi-tools, covering data preprocessing, model training, and differential expression analysis. The notebook was run with scvi-tools version 1.4.0. Please refer to the release notearrow-up-right for more details. The scVI model description can be found in scvi’s user guidearrow-up-right

Overview of CZ CELLxGENE dataset

The dataset originates from the study “Single-cell resolution characterization of myeloid-derived cell states with implication in cancer outcomearrow-up-right” and is available on CZ CELLxGENE under the title “A multi-tissue single-cell tumor microenvironment atlas”arrow-up-right.

It aggregates nearly 400,000 single-cell transcriptomic profiles from 13 independent studies covering eight tumor and non-tumor tissue sources (including breast, colorectal, ovary, lung, liver, skin, uvea, and PBMC). It brings together samples collected from normal tissue, primary tumors, lymph nodes, and peripheral blood, generated across three commonly used single-cell RNA-seq technologies (10x, Smart-seq2, and inDrop). The atlas provides detailed annotations of major cellular populations, with a particular emphasis on characterizing myeloid-derived cell states. At DNAnexus, we have downloaded this data for your use on the platform. See the “Where to Access Data Asset” section below to start accessing the dataset.

Where to Access Data Asset

The following data are available on DNAnexus

  • The AnnData file of “A multi-tissue single-cell tumor microenvironment atlas” was directly downloaded from CZ CELLxGENE portal. The file is stored in the DNAnexus project folder under the name: A_multi_tissue_single_cell_tumor_microenvironment_atlas.h5ad. The location of this file here on the platformarrow-up-right.

  • An example notebook demonstrating the scvi-tools analysis workflow can be accessed here on the platformarrow-up-right. The find endings are .ipynb.

To use the dataset and notebook, please copy the data and notebook into your own project space. Details on how to copy the data are present under the section titled "Copying Data and Notebook into a Project".

Running scvi-tools on DNAnexus

Copying Data and Notebooks into a Project

To utilize the dataset, please copy the data from this projectarrow-up-right into your own project. Here are the steps to copy the data into a Project Space:

  1. Create a project for your single cell analysis, billed to your own organization. Tutorials on how to set up a project can be found on this pagearrow-up-right.

  2. Go to Resources Tab and find the project titled “Public Datasets AWS US (East)” and select the folder "Single_cell_analysis".

  3. Select the data folder and the notebook

  4. Select "Copy" on the top right menu, and select the project that you created in Step 1.

  5. Then, go to the project space you created in Step 1 to start exploring the CZ CELLxGENE dataset and scvi-tools notebook.

  6. To run the JupyterLab Notebooks, please see the AI/ ML Accelerator- ML JupyterLabarrow-up-right section

Instance Type Selection

  • Instances times are subject to their queues. Less common instance types may result in longer wait times due to their limited availability

  • GPU Instances take longer to set up compared to singular CPU instance types due to their availability and complexity.

  • Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.

  • Notebook is optimized for the AI/ ML Accelerator- ML JupyterLab Apparrow-up-right. If you would like to utilize AI/ ML Accelerator and do not have access, please contact the Success Team at [email protected] or the Sales Team at [email protected]envelope

  • Instance type to use: mem2_ssd1_gpu_x64

  • Please follow the provided command-line instructions in the terminal that are found in the notebook example before running the notebook.

Last updated

Was this helpful?