• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于疾病亚型分析的组学数据的稳健相关性估计和UMAP辅助拓扑分析。

Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping.

作者信息

Rather Arif Ahmad, Chachoo Manzoor Ahmad

机构信息

Department of Computer Sciences, University of Kashmir, Srinagar, JK, India.

Department of Computer Sciences, University of Kashmir, Srinagar, JK, India.

出版信息

Comput Biol Med. 2023 Mar;155:106640. doi: 10.1016/j.compbiomed.2023.106640. Epub 2023 Feb 8.

DOI:10.1016/j.compbiomed.2023.106640
PMID:36774889
Abstract

Deciphering information hidden in the gene expression assays for identifying disease subtypes has significant importance in precision medicine. However, computational limitations thwart this process due to the intricacy of the biological networks and the curse of dimensionality of gene expression data. Therefore, clustering in such scenarios often becomes the first choice of exploratory data analysis to identify natural structures and intrinsic patterns in the data. However, sparse and high dimensional nature of omics data prevents conventional clustering algorithms to discover subtypes that are clinically relevant and statistically significant. Hence, non-linear dimensionality reduction techniques coupled with clustering in such scenarios often becomes imperative to improve the clustering results. In this study, we present a robust pipeline to discover disease subtypes with clinical relevance. Specifically, we focus on discovering patient sub-groups that have a residual life patterns remarkably different from other sub-groups. This is significant because by refining prognosis, subtyping can reduce uncertainty in approximating patients expected outcome. The methodology present is based on robust correlation estimation, UMAP- a non-linear dimensionality reduction method and mapper- a tool from topology. Notably, we suggest a method for improving the robustness of the correlation matrix of gene expression data for improving the clustering results. The performance of the model is evaluated by applying to five cancer datasets obtained through TCGA and comparisons are performed with some state of the art methods of NEMO, RSC-OTRI and SNF with regard to log-rank test and Restricted Life Expectancy Difference. For example in GBM dataset, the minimum separation for any two discovered subtypes is 221 days which is significantly higher than the other methodologies. We also compared the results without using the robust correlation based estimate and observed that robust correlation improves separability between survival curves significantly. From the results we infer that our methodology performs better compared to other methodologies with regard to separating survival curves of patient sub-groups despite using single omics profiles of patients compared to multiple omics profiles of SNF and NEMO. Pathway over-representation analysis is performed on the final clustering results to investigate the biological underpinnings characterizing each subtype.

摘要

在精准医学中,解读基因表达分析中隐藏的信息以识别疾病亚型具有重要意义。然而,由于生物网络的复杂性和基因表达数据的维度诅咒,计算限制阻碍了这一过程。因此,在这种情况下进行聚类通常成为探索性数据分析的首选,以识别数据中的自然结构和内在模式。然而,组学数据的稀疏性和高维性使得传统聚类算法难以发现具有临床相关性和统计学意义的亚型。因此,在这种情况下,结合聚类的非线性降维技术通常对于改善聚类结果至关重要。在本研究中,我们提出了一个稳健的流程来发现具有临床相关性的疾病亚型。具体而言,我们专注于发现那些剩余生命模式与其他亚组显著不同的患者亚组。这很重要,因为通过细化预后,亚型分类可以减少估计患者预期结果时的不确定性。所提出的方法基于稳健的相关性估计、UMAP(一种非线性降维方法)和Mapper(一种来自拓扑学的工具)。值得注意的是,我们提出了一种提高基因表达数据相关矩阵稳健性的方法,以改善聚类结果。通过将模型应用于通过TCGA获得的五个癌症数据集来评估模型的性能,并在对数秩检验和受限预期寿命差异方面与一些先进的方法(如NEMO、RSC - OTRI和SNF)进行比较。例如,在GBM数据集中,任何两个发现的亚型之间的最小间隔为221天,这明显高于其他方法。我们还比较了不使用基于稳健相关性估计的结果,发现稳健相关性显著提高了生存曲线之间的可分离性。从结果中我们推断,尽管与SNF和NEMO使用多个组学概况相比,我们的方法使用的是患者的单个组学概况,但在分离患者亚组的生存曲线方面,我们的方法比其他方法表现更好。对最终聚类结果进行通路过度表达分析以研究表征每个亚型的生物学基础。

相似文献

1
Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping.用于疾病亚型分析的组学数据的稳健相关性估计和UMAP辅助拓扑分析。
Comput Biol Med. 2023 Mar;155:106640. doi: 10.1016/j.compbiomed.2023.106640. Epub 2023 Feb 8.
2
Weighted dimensionality reduction and robust Gaussian mixture model based cancer patient subtyping from gene expression data.基于加权降维和鲁棒高斯混合模型的基因表达数据癌症患者亚型分析。
J Biomed Inform. 2020 Dec;112:103620. doi: 10.1016/j.jbi.2020.103620. Epub 2020 Nov 11.
3
A topological approach for cancer subtyping from gene expression data.基于基因表达数据的癌症亚型拓扑分析方法。
J Biomed Inform. 2020 Feb;102:103357. doi: 10.1016/j.jbi.2019.103357. Epub 2019 Dec 29.
4
Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类,以对患者进行亚型划分。
Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.
5
Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping.捕获自动编码器的潜在空间,用于多组学整合和癌症亚型分类。
Comput Biol Med. 2022 Sep;148:105832. doi: 10.1016/j.compbiomed.2022.105832. Epub 2022 Jul 5.
6
PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME:基于通路的多模态稀疏自动编码器,用于对患者层面多组学数据进行聚类。
BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.
7
RSC-based differential model with correlation removal for improving multi-omics clustering.基于 RSC 的相关去除差分模型,用于改善多组学聚类。
J Theor Biol. 2023 Jan 7;556:111328. doi: 10.1016/j.jtbi.2022.111328. Epub 2022 Oct 21.
8
COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.COPS:一种通过稳健的聚类算法多目标评估发现多组学疾病亚型的新平台。
PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.
9
A Multiview Clustering Method With Low-Rank and Sparsity Constraints for Cancer Subtyping.一种具有低秩和稀疏约束的多视图聚类方法用于癌症亚型分析
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3213-3223. doi: 10.1109/TCBB.2021.3122917. Epub 2022 Dec 8.
10
Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering.基于生存的贝叶斯聚类,探索更具临床相关性的患者异质性解剖。
Bioinformatics. 2017 Nov 15;33(22):3558-3566. doi: 10.1093/bioinformatics/btx464.

引用本文的文献

1
Untargeted pixel-by-pixel metabolite ratio imaging as a novel tool for biomedical discovery in mass spectrometry imaging.非靶向逐像素代谢物比率成像作为质谱成像中生物医学发现的一种新工具。
Elife. 2025 Mar 18;13:RP96892. doi: 10.7554/eLife.96892.
2
An orchestra of machine learning methods reveals landmarks in single-cell data exemplified with aging fibroblasts.机器学习方法的交响乐揭示了单细胞数据中的标志性事件,以衰老成纤维细胞为例。
PLoS One. 2024 Apr 17;19(4):e0302045. doi: 10.1371/journal.pone.0302045. eCollection 2024.
3
Clustering Methods for Vibro-Acoustic Sensing Features as a Potential Approach to Tissue Characterisation in Robot-Assisted Interventions.
基于振动声传感特征的聚类方法在机器人辅助介入手术中的组织特征分析中的应用
Sensors (Basel). 2023 Nov 21;23(23):9297. doi: 10.3390/s23239297.