Statistical Simulation and Analysis of Single-cell RNA-seq Data

Statistical Simulation and Analysis of Single-cell RNA-seq Data
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:1415864820
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Statistical Simulation and Analysis of Single-cell RNA-seq Data by : Tianyi Sun

Download or read book Statistical Simulation and Analysis of Single-cell RNA-seq Data written by Tianyi Sun and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The recent development of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies by revealing the genome-wide gene expression levels within individual cells. In contrast to bulk RNA sequencing, scRNA-seq technology captures cell-specific transcriptome landscapes, which can reveal crucial information about cell-to-cell heterogeneity across different tissues, organs, and systems and enable the discovery of novel cell types and new transient cell states. According to search results from PubMed, from 2009-2023, over 5,000 published studies have generated datasets using this technology. Such large volumes of data call for high-quality statistical methods for their analysis. In the three projects of this dissertation, I have explored and developed statistical methods to model the marginal and joint gene expression distributions and determine the latent structure type for scRNA-seq data. In all three projects, synthetic data simulation plays a crucial role. My first project focuses on the exploration of the Beta-Poisson hierarchical model for the marginal gene expression distribution of scRNA-seq data. This model is a simplified mechanistic model with biological interpretations. Through data simulation, I demonstrate three typical behaviors of this model under different parameter combinations, one of which can be interpreted as one source of the sparsity and zero inflation that is often observed in scRNA-seq datasets. Further, I discuss parameter estimation methods of this model and its other applications in the analysis of scRNA-seq data. My second project focuses on the development of a statistical simulator, scDesign2, to generate realistic synthetic scRNA-seq data. Although dozens of simulators have been developed before, they lack the capacity to simultaneously achieve the following three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, scDesign2 is developed as a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple scRNA-seq protocols and other single-cell gene expression count-based technologies. Compared with existing simulators, scDesign2 is advantageous in its transparent use of probabilistic models and is unique in its ability to capture gene correlations via copula. We verify that scDesign2 generates more realistic synthetic data for four scRNA-seq protocols (10x Genomics, CEL-Seq2, Fluidigm C1, and Smart-Seq2) and two single-cell spatial transcriptomics protocols (MERFISH and pciSeq) than existing simulators do. Under two typical computational tasks, cell clustering and rare cell type detection, we demonstrate that scDesign2 provides informative guidance on deciding the optimal sequencing depth and cell number in single-cell RNA-seq experimental design, and that scDesign2 can effectively benchmark computational methods under varying sequencing depths and cell numbers. With these advantages, scDesign2 is a powerful tool for single-cell researchers to design experiments, develop computational methods, and choose appropriate methods for specific data analysis needs. My third project focuses on deciding latent structure types for scRNA-seq datasets. Clustering and trajectory inference are two important data analysis tasks that can be performed for scRNA-seq datasets and will lead to different interpretations. However, as of now, there is no principled way to tell which one of these two types of analysis results is more suitable to describe a given dataset. In this project, we propose two computational approaches that aim to distinguish cluster-type vs. trajectory-type scRNA-seq datasets. The first approach is based on building a classifier using eigenvalue features of the gene expression covariance matrix, drawing inspiration from random matrix theory (RMT). The second approach is based on comparing the similarity of real data and simulated data generated by assuming the cell latent structure as clusters or a trajectory. While both approaches have limitations, we show that the second approach gives more promising results and has room for further improvements.


Statistical Simulation and Analysis of Single-cell RNA-seq Data Related Books

Statistical Simulation and Analysis of Single-cell RNA-seq Data
Language: en
Pages: 0
Authors: Tianyi Sun
Categories:
Type: BOOK - Published: 2023 - Publisher:

DOWNLOAD EBOOK

The recent development of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies by revealing the genome-wide gene expres
Statistical Methods for Bulk and Single-cell RNA Sequencing Data
Language: en
Pages: 207
Authors: Wei Li
Categories:
Type: BOOK - Published: 2019 - Publisher:

DOWNLOAD EBOOK

Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecul
Statistical Methods for RNA-sequencing Data
Language: en
Pages: 0
Authors: Rhonda Bacher
Categories:
Type: BOOK - Published: 2017 - Publisher:

DOWNLOAD EBOOK

Major methodological and technological advances in sequencing have inspired ambitious biological questions that were previously elusive. Addressing such questio
Statistical Methods for the Analysis of Genomic Data
Language: en
Pages: 136
Authors: Hui Jiang
Categories: Science
Type: BOOK - Published: 2020-12-29 - Publisher: MDPI

DOWNLOAD EBOOK

In recent years, technological breakthroughs have greatly enhanced our ability to understand the complex world of molecular biology. Rapid developments in genom
Statistical Methods for Whole Transcriptome Sequencing
Language: en
Pages: 0
Authors: Cheng Jia
Categories:
Type: BOOK - Published: 2017 - Publisher:

DOWNLOAD EBOOK

RNA-Sequencing (RNA-Seq) has enabled detailed unbiased profiling of whole transcriptomes with incredible throughput. Recent technological breakthroughs have pus