Suppr超能文献

单细胞RNA测序数据降维方法的比较

A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data.

作者信息

Xiang Ruizhi, Wang Wencan, Yang Lei, Wang Shiyuan, Xu Chaohan, Chen Xiaowen

机构信息

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.

School of Optometry and Ophthalmology and Eye Hospital, Wenzhou Medical University, Wenzhou, China.

出版信息

Front Genet. 2021 Mar 23;12:646936. doi: 10.3389/fgene.2021.646936. eCollection 2021.

Abstract

Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology performed at the level of an individual cell, which can have a potential to understand cellular heterogeneity. However, scRNA-seq data are high-dimensional, noisy, and sparse data. Dimension reduction is an important step in downstream analysis of scRNA-seq. Therefore, several dimension reduction methods have been developed. We developed a strategy to evaluate the stability, accuracy, and computing cost of 10 dimensionality reduction methods using 30 simulation datasets and five real datasets. Additionally, we investigated the sensitivity of all the methods to hyperparameter tuning and gave users appropriate suggestions. We found that t-distributed stochastic neighbor embedding (t-SNE) yielded the best overall performance with the highest accuracy and computing cost. Meanwhile, uniform manifold approximation and projection (UMAP) exhibited the highest stability, as well as moderate accuracy and the second highest computing cost. UMAP well preserves the original cohesion and separation of cell populations. In addition, it is worth noting that users need to set the hyperparameters according to the specific situation before using the dimensionality reduction methods based on non-linear model and neural network.

摘要

单细胞RNA测序(scRNA-seq)是一种在单个细胞水平上进行的高通量测序技术,它有潜力帮助理解细胞异质性。然而,scRNA-seq数据是高维、有噪声且稀疏的数据。降维是scRNA-seq下游分析中的重要步骤。因此,已经开发了几种降维方法。我们开发了一种策略,使用30个模拟数据集和5个真实数据集来评估10种降维方法的稳定性、准确性和计算成本。此外,我们研究了所有方法对超参数调整的敏感性,并为用户提供了适当的建议。我们发现,t分布随机邻域嵌入(t-SNE)以最高的准确性和计算成本产生了最佳的整体性能。同时,均匀流形近似与投影(UMAP)表现出最高的稳定性,以及中等的准确性和第二高的计算成本。UMAP很好地保留了细胞群体的原始凝聚性和分离性。此外,值得注意的是,在使用基于非线性模型和神经网络的降维方法之前,用户需要根据具体情况设置超参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5de4/8021860/5b70e8917fcc/fgene-12-646936-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验