Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, United States of America.
Department of Mathematics, University of Hawaii at Manoa, Honolulu, HI, United States of America.
PLoS One. 2023 Apr 26;18(4):e0284820. doi: 10.1371/journal.pone.0284820. eCollection 2023.
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper, differential gene expression, and spectral shape analysis. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-distributed stochastic neighbor embedding (t-SNE). Although Mapper shows promise in analyzing high-dimensional data, tools to statistically analyze Mapper graphical structures are limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
Mapper 是一种拓扑算法,常用于构建数据的图形表示,作为探索性工具。这种表示形式可以帮助更好地理解高维基因组数据的固有形状,并保留使用标准降维算法可能丢失的信息。我们提出了一种新颖的工作流程,用于处理和分析肿瘤和健康受试者的 RNA-seq 数据,该流程集成了 Mapper、差异基因表达和光谱形状分析。具体来说,我们表明,高斯混合逼近方法可用于生成图形结构,成功地区分肿瘤和健康受试者,并生成肿瘤受试者的两个亚组。使用 DESeq2(一种用于检测差异表达基因的流行工具)进行的进一步分析表明,这两个肿瘤细胞亚组具有两种不同的基因调控,这表明形成肺癌有两种不同的途径,这两种途径无法通过其他流行的聚类方法(包括 t 分布随机邻域嵌入(t-SNE))突出显示。尽管 Mapper 在分析高维数据方面显示出前景,但在现有文献中,用于统计分析 Mapper 图形结构的工具是有限的。在本文中,我们使用热核签名开发了一种评分方法,为假设检验、敏感性分析和相关分析等统计推断提供了经验设置。