Suppr超能文献

UMAP 通过降维增强了批量转录组数据中样本异质性分析。

Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.

机构信息

The University of Queensland Diamantina Institute, Faculty of Medicine, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia; Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China.

Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; School of Microelectronics, Shandong University, Jinan, China.

出版信息

Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.

Abstract

Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.

摘要

转录组分析在生物医学研究中起着关键作用。线性降维方法,特别是主成分分析(PCA),广泛用于检测样本间的异质性,而最近开发的非线性方法,如 t 分布随机邻域嵌入(t-SNE)和一致流形逼近和投影(UMAP),可在单细胞 RNA 测序分析中有效地对异质样本进行聚类。然而,t-SNE 和 UMAP 在批量转录组分析中的应用以及与传统方法的比较尚未实现。我们比较了四种主要的降维方法(PCA、多维尺度分析[MDS]、t-SNE 和 UMAP)在分析 71 个大型批量转录组数据集。UMAP 优于 PCA 和 MDS,但在区分批次效应、识别预定义的生物学组和在二维空间中揭示深入的聚类方面,优于 t-SNE。重要的是,UMAP 生成的样本聚类揭示了生物学特征和临床意义。我们建议在可视化和分析大量批量转录组数据集中部署 UMAP,以加强样本异质性分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验