Suppr超能文献

降维与聚类模型在单细胞 RNA 测序数据中的应用:一项比较研究。

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

机构信息

Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Zhuhai Sub Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China.

出版信息

Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.

Abstract

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.

摘要

随着单细胞 RNA 测序技术的不断发展,已经产生了大量的转录组数据集。这些数据集进一步加深了我们对同质群体中细胞异质性及其潜在机制的理解。单细胞 RNA 测序(scRNA-seq)数据聚类可以根据基因表达中嵌入的模式将属于同一细胞类型的细胞进行分组。然而,由于现有 scRNA-seq 技术的限制,scRNA-seq 数据具有高度的维数、噪声和稀疏性。传统的聚类方法对于高维和稀疏矩阵计算效率不高。因此,引入了几种降维方法。为了验证可靠和标准的研究流程,我们对四种经典降维方法和五种聚类模型进行了全面的回顾和评估。在两个大型 scRNA-seq 数据集上进行了四个实验,共使用了 20 个模型。结果表明,特征选择方法对高维稀疏 scRNA-seq 数据有积极的贡献。此外,特征提取方法能够促进聚类性能,尽管并非一成不变。独立成分分析(ICA)在那些小的压缩特征空间中表现良好,而主成分分析比所有其他特征提取方法都更稳定。此外,ICA 并不适合模糊 C 均值聚类在 scRNA-seq 数据分析中的应用。K-means 聚类与特征提取方法相结合可以取得良好的效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df07/7139673/67627ae4655c/ijms-21-02181-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验