Ellis Dorothy, Wu Dongyuan, Datta Susmita
Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL.
Wiley Interdiscip Rev Comput Stat. 2022 Jul-Aug;14(4). doi: 10.1002/wics.1558. Epub 2021 May 20.
Due to the development of next-generation RNA sequencing (NGS) technologies, there has been tremendous progress in research involving determining the role of genomics, transcriptomics and epigenomics in complex biological systems. However, scientists have realized that information obtained using earlier technology, frequently called 'bulk RNA-seq' data, provides information averaged across all the cells present in a tissue. Relatively newly developed single cell (scRNA-seq) technology allows us to provide transcriptomic information at a single-cell resolution. Nevertheless, these high-resolution data have their own complex natures and demand novel statistical data analysis methods to provide effective and highly accurate results on complex biological systems. In this review, we cover many such recently developed statistical methods for researchers wanting to pursue scRNA-seq statistical and computational research as well as scientific research about these existing methods and free software tools available for their generated data. This review is certainly not exhaustive due to page limitations. We have tried to cover the popular methods starting from quality control to the downstream analysis of finding differentially expressed genes and concluding with a brief description of network analysis.
由于下一代RNA测序(NGS)技术的发展,在涉及确定基因组学、转录组学和表观基因组学在复杂生物系统中的作用的研究方面取得了巨大进展。然而,科学家们已经意识到,使用早期技术获得的信息,通常称为“批量RNA测序”数据,提供的是组织中所有细胞的平均信息。相对较新开发的单细胞(scRNA-seq)技术使我们能够以单细胞分辨率提供转录组信息。然而,这些高分辨率数据有其自身复杂的性质,需要新颖的统计数据分析方法,以便在复杂生物系统上提供有效且高度准确的结果。在这篇综述中,我们为想要进行scRNA-seq统计和计算研究的研究人员以及关于这些现有方法和可用于其生成数据的免费软件工具的科学研究,涵盖了许多此类最近开发的统计方法。由于篇幅限制,本综述肯定并不详尽。我们试图涵盖从质量控制到寻找差异表达基因的下游分析等流行方法,并以网络分析的简要描述作为结尾。