scvi-tools and CZ CELLxGENE
Necessary Disclaimers and Legal
The user is responsible for reviewing and complying with the license requirements of the software, notebooks, and data referenced in this documentation.
Users are responsible for the costs associated with analyzing the CZ CELLxGENE dataset and scvi-tools and its storage in their project spaces.
Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.
Citations for the scvi-tools and CZ CELLxGENE dataset
For this demonstration, we adapted the Introduction to scvi-tools notebook, developed by the scvi-tools development team. Users may cite scvi-tools manuscript published in 2022 along with the original papers describing each model, which are referred to in the corresponding documentation. In this example, we applied the scVI (single-cell Variational Inference) model; its description is available in the publication Deep generative modeling for single-cell transcriptomics.
The scVI model is trained on the human single-cell RNA-seq dataset downloaded from the CZ CELLxGENE data portal. Cite the publication associated with this dataset: Single-cell resolution characterization of myeloid-derived cell states with implication in cancer outcome
CZ CELLxGENE brings together a wide range of public single-cell datasets that have been shared through the Chan Zuckerberg Initiative platform. These datasets are uploaded by the original researchers and distributed under the creative commons CC BY 4.0 license. More information may be found in the CZ CELLxGENE Data Submission Policy.
Cite CZ CELLxGENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data CZI Single-Cell Biology, et al. bioRxiv 2023.10.30; doi: https://doi.org/10.1101/2023.10.30.563174
Overview of scvi-tools
scvi-tools is a software ecosystem designed for fully processing and modeling single-cell omics datasets. The project originates from work carried out in the Yosef Lab at UC Berkeley in collaboration with researchers at the Weizmann Institute of Science. The toolkit can be thought of in two parts:
it offers an accessible interface for applying various probabilistic methods to single-cell data (including models like scVI, scANVI, and totalVI), and
It provides a framework for constructing new probabilistic approaches using the PyTorch, PyTorch Lightning, and Pyro libraries.
On DNAnexus, we provide a notebook that demonstrates an end-to-end single-cell RNA-seq workflow using scvi-tools, covering data preprocessing, model training, and differential expression analysis. The notebook was run with scvi-tools version 1.4.0. Please refer to the release note for more details. The scVI model description can be found in scvi’s user guide
Overview of CZ CELLxGENE dataset
The dataset originates from the study “Single-cell resolution characterization of myeloid-derived cell states with implication in cancer outcome” and is available on CZ CELLxGENE under the title “A multi-tissue single-cell tumor microenvironment atlas”.
It aggregates nearly 400,000 single-cell transcriptomic profiles from 13 independent studies covering eight tumor and non-tumor tissue sources (including breast, colorectal, ovary, lung, liver, skin, uvea, and PBMC). It brings together samples collected from normal tissue, primary tumors, lymph nodes, and peripheral blood, generated across three commonly used single-cell RNA-seq technologies (10x, Smart-seq2, and inDrop). The atlas provides detailed annotations of major cellular populations, with a particular emphasis on characterizing myeloid-derived cell states. At DNAnexus, we have downloaded this data for your use on the platform. See the “Where to Access Data Asset” section below to start accessing the dataset.
Where to Access Data Asset
The following data are available on DNAnexus
The AnnData file of “A multi-tissue single-cell tumor microenvironment atlas” was directly downloaded from CZ CELLxGENE portal. The file is stored in the DNAnexus project folder under the name: A_multi_tissue_single_cell_tumor_microenvironment_atlas.h5ad. The location of this file here on the platform.
An example notebook demonstrating the scvi-tools analysis workflow can be accessed here on the platform. The find endings are .ipynb.
To use the dataset and notebook, please copy the data and notebook into your own project space. Details on how to copy the data are present under the section titled "Copying Data and Notebook into a Project".
Running scvi-tools on DNAnexus
Copying Data and Notebooks into a Project
To utilize the dataset, please copy the data from this project into your own project. Here are the steps to copy the data into a Project Space:
Create a project for your single cell analysis, billed to your own organization. Tutorials on how to set up a project can be found on this page.
Go to Resources Tab and find the project titled “Public Datasets AWS US (East)” and select the folder "Single_cell_analysis".
Select the data folder and the notebook
Select "Copy" on the top right menu, and select the project that you created in Step 1.
Then, go to the project space you created in Step 1 to start exploring the CZ CELLxGENE dataset and scvi-tools notebook.
To run the JupyterLab Notebooks, please see the AI/ ML Accelerator- ML JupyterLab section
Instance Type Selection
Instances times are subject to their queues. Less common instance types may result in longer wait times due to their limited availability
GPU Instances take longer to set up compared to singular CPU instance types due to their availability and complexity.
Instance type availability and pricing are subject to the contract between the user or the user’s organization and DNAnexus.
Notebook is optimized for the AI/ ML Accelerator- ML JupyterLab App. If you would like to utilize AI/ ML Accelerator and do not have access, please contact the Success Team at [email protected] or the Sales Team at [email protected]
Instance type to use: mem2_ssd1_gpu_x64
Please follow the provided command-line instructions in the terminal that are found in the notebook example before running the notebook.
Last updated
Was this helpful?