Suppr超能文献

单细胞RNA测序数据分析方法组合的比较。

Comparison of scRNA-seq data analysis method combinations.

作者信息

Xu Li, Xue Tong, Ding Weiyue, Shen Linshan

出版信息

Brief Funct Genomics. 2022 Nov 17;21(6):433-440. doi: 10.1093/bfgp/elac027.

Abstract

Single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data analysis refers to the use of appropriate methods to analyze the dataset generated by RNA-sequencing performed on the single-cell transcriptome. It usually contains three steps: normalization to eliminate the technical noise, dimensionality reduction to facilitate visual understanding and data compression and clustering to divide the data into several similarity-based clusters. In addition, the gene expression data contain a large number of zero counts. These zero counts are considered relevant to random dropout events induced by multiple factors in the sequencing experiments, such as low RNA input, and the stochastic nature of the gene expression pattern at the single-cell level. The zero counts can be eliminated only through the analysis of the scRNA-seq data, and although many methods have been proposed to this end, there is still a lack of research on the combined effect of existing methods. In this paper, we summarize the two kinds of normalization, two kinds of dimension reduction and three kinds of clustering methods widely used in the current mainstream scRNA-seq data analysis. Furthermore, we propose to combine these methods into 12 technology combinations, each with a whole set of scRNA-seq data analysis processes. We evaluated the proposed combinations using Goolam, a publicly available scRNA-seq, by comparing the final clustering results and found the most suitable collection scheme of these classic methods. Our results showed that using appropriate technology combinations can improve the efficiency and accuracy of the scRNA-seq data analysis. The combinations not only satisfy the basic requirements of noise reduction, dimension reduction and cell clustering but also ensure preserving the heterogeneity of cells in downstream analysis. The dataset, Goolam, used in the study can be obtained from the ArrayExpress database under the accession number E-MTAB-3321.

摘要

单细胞核糖核酸(RNA)测序(scRNA-seq)数据分析是指运用适当方法分析对单细胞转录组进行RNA测序所生成的数据集。它通常包含三个步骤:归一化以消除技术噪声、降维以便于直观理解和数据压缩,以及聚类以将数据划分为几个基于相似度的簇。此外,基因表达数据包含大量零计数。这些零计数被认为与测序实验中多种因素引发的随机丢失事件相关,比如低RNA输入以及单细胞水平上基因表达模式的随机性。只有通过scRNA-seq数据分析才能消除零计数,尽管为此已经提出了许多方法,但对于现有方法的组合效应仍缺乏研究。在本文中,我们总结了当前主流scRNA-seq数据分析中广泛使用的两类归一化方法、两类降维方法和三类聚类方法。此外,我们提议将这些方法组合成12种技术组合,每种组合都有一整套scRNA-seq数据分析流程。我们通过比较最终聚类结果,使用公开可用的scRNA-seq数据集Goolam对所提出的组合进行了评估,并找到了这些经典方法最合适的组合方案。我们的结果表明,使用适当的技术组合可以提高scRNA-seq数据分析的效率和准确性。这些组合不仅满足降噪、降维和细胞聚类的基本要求,还能确保在下游分析中保留细胞的异质性。该研究中使用的数据集Goolam可从ArrayExpress数据库获取,登录号为E-MTAB-3321。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验