• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于加权降维和鲁棒高斯混合模型的基因表达数据癌症患者亚型分析。

Weighted dimensionality reduction and robust Gaussian mixture model based cancer patient subtyping from gene expression data.

机构信息

Machine Learning Lab, Department of Electronics and Communication Engineering, National Institute of Technology, Srinagar, JK, India.

Machine Learning Lab, Department of Electronics and Communication Engineering, National Institute of Technology, Srinagar, JK, India.

出版信息

J Biomed Inform. 2020 Dec;112:103620. doi: 10.1016/j.jbi.2020.103620. Epub 2020 Nov 11.

DOI:10.1016/j.jbi.2020.103620
PMID:33188907
Abstract

BACKGROUND

The heterogeneous nature of cancer necessitates subtyping of cancer patients into distinct and well separated subgroups. However, computational issues arise because gene expression data is noisy and contains outliers apart from being high dimensional. As such, an attempt to subtype cancer patients from gene expression data leads to highly overlapping Kaplan-Meier (KM) survival plots and thus clear distinction among the discovered subtypes becomes difficult. Here we attempt to achieve a greater separation among the subtypes through a robust clustering pipeline.

METHODS

We propose a robust framework to achieve a better separation among the discovered subtypes. Our framework is based on dimensionality reduction of a weighted gene expression matrix using t-distributed Stochastic Neighbor Embedding (t-SNE) and a robust Gaussian mixture model based clustering approach. Every gene is weighted according to the median absolute deviation (MAD) of the gene before dimensionality reduction. The results are quantified by measuring the minimum pairwise separation among the KM plots and minimum hazard ratio among the subtypes. We also introduce a novel method, called cumulative survival separation, to quantify the separation among the discovered subtypes.

RESULTS

To validate the proposed methodology we obtained five cancer gene expression datasets from The Cancer Genome Atlas (TCGA) and comparisons with Consensus Clustering (CC), Consensus non-negative matrix factorization (CNMF), fast density-aware spectral clustering (Spectrum) and Neighborhood based Multi-Omics clustering (NEMO) methodologies show that the proposed method is able to achieve a greater separation compared to the aforementioned methods in literature. For instance, the minimum pairwise life expectancy difference (in days) between the discovered subtypes for GBM is 61 days for the proposed methodology with MAD scores, whereas it is approximately 33, 19, 49 and 33 days only for CC, Spectrum, Nemo and CNMF respectively. Comparisons are also shown for the proposed framework with and without using the MAD scores and it is observed that MAD score significantly improves the subtype separation. Hazard ratio analysis also shows that the proposed methodology performs better. Furthermore, pathway over-representation analyses were carried to identify relevant genetic pathways which can be possible targets for treatment.

CONCLUSION

The results suggest that the use of median absolute deviation and a robust clustering methodology are helpful in achieving greater separation among the subtypes with better statistical and clinical significance.

摘要

背景

癌症的异质性需要将癌症患者分为不同的、明显分开的亚组。然而,由于基因表达数据存在噪声和异常值,并且维度较高,因此在尝试根据基因表达数据对癌症患者进行亚型分类时会出现计算问题。因此,试图从基因表达数据中对癌症患者进行亚型分类会导致 Kaplan-Meier(KM)生存曲线高度重叠,从而难以清楚地区分发现的亚型。在这里,我们试图通过稳健的聚类管道来实现亚组之间的更大分离。

方法

我们提出了一个稳健的框架来实现发现的亚型之间更好的分离。我们的框架基于使用 t 分布随机邻居嵌入(t-SNE)和基于稳健高斯混合模型的聚类方法对加权基因表达矩阵进行降维。在降维之前,根据基因的中位数绝对偏差(MAD)对每个基因进行加权。通过测量 KM 图之间的最小成对分离和亚型之间的最小风险比来量化结果。我们还引入了一种新的方法,称为累积生存分离,用于量化发现的亚型之间的分离。

结果

为了验证所提出的方法,我们从癌症基因组图谱(TCGA)中获得了五个癌症基因表达数据集,并与共识聚类(CC)、共识非负矩阵分解(CNMF)、快速密度感知谱聚类(Spectrum)和基于邻域的多组学聚类(NEMO)方法进行比较,结果表明,与文献中的上述方法相比,所提出的方法能够实现更大的分离。例如,对于 GBM,发现的亚型之间的最小成对预期寿命差异(以天为单位)对于所提出的方法使用 MAD 分数为 61 天,而仅对于 CC、Spectrum、Nemo 和 CNMF 分别为约 33、19、49 和 33 天。还展示了使用和不使用 MAD 分数的建议框架之间的比较,并且观察到 MAD 分数显著提高了亚型分离。风险比分析也表明,所提出的方法表现更好。此外,还进行了途径过度表达分析,以确定可能作为治疗靶点的相关遗传途径。

结论

结果表明,使用中位数绝对偏差和稳健的聚类方法有助于在具有更好的统计和临床意义的情况下实现亚组之间更大的分离。

相似文献

1
Weighted dimensionality reduction and robust Gaussian mixture model based cancer patient subtyping from gene expression data.基于加权降维和鲁棒高斯混合模型的基因表达数据癌症患者亚型分析。
J Biomed Inform. 2020 Dec;112:103620. doi: 10.1016/j.jbi.2020.103620. Epub 2020 Nov 11.
2
A topological approach for cancer subtyping from gene expression data.基于基因表达数据的癌症亚型拓扑分析方法。
J Biomed Inform. 2020 Feb;102:103357. doi: 10.1016/j.jbi.2019.103357. Epub 2019 Dec 29.
3
Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping.用于疾病亚型分析的组学数据的稳健相关性估计和UMAP辅助拓扑分析。
Comput Biol Med. 2023 Mar;155:106640. doi: 10.1016/j.compbiomed.2023.106640. Epub 2023 Feb 8.
4
Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类,以对患者进行亚型划分。
Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.
5
Sequential analysis of transcript expression patterns improves survival prediction in multiple cancers.转录表达模式的序贯分析提高了多种癌症的生存预测。
BMC Cancer. 2020 Apr 7;20(1):297. doi: 10.1186/s12885-020-06756-x.
6
Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping.捕获自动编码器的潜在空间,用于多组学整合和癌症亚型分类。
Comput Biol Med. 2022 Sep;148:105832. doi: 10.1016/j.compbiomed.2022.105832. Epub 2022 Jul 5.
7
Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping.高斯混合 Copulas 用于高维聚类和基于依赖关系的亚型划分。
Bioinformatics. 2020 Jan 15;36(2):621-628. doi: 10.1093/bioinformatics/btz599.
8
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression.基于基因表达的网络辅助协同聚类算法发现癌症亚型。
BMC Bioinformatics. 2014 Feb 4;15:37. doi: 10.1186/1471-2105-15-37.
9
PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME:基于通路的多模态稀疏自动编码器,用于对患者层面多组学数据进行聚类。
BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.
10
COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms.COPS:一种通过稳健的聚类算法多目标评估发现多组学疾病亚型的新平台。
PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.

引用本文的文献

1
Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision.通过Adam和RanAdam超参数调优提高从微阵列数据检测肺癌的分类器性能——追求精准度
Bioengineering (Basel). 2024 Mar 26;11(4):314. doi: 10.3390/bioengineering11040314.
2
Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity.利用联盟规模的表观遗传数据作为细胞身份的生物学锚点,从单细胞数据中推断细胞多样性。
Nucleic Acids Res. 2023 Jun 23;51(11):e62. doi: 10.1093/nar/gkad307.
3
Network-based cancer heterogeneity analysis incorporating multi-view of prior information.
基于网络的癌症异质性分析,纳入多视图的先验信息。
Bioinformatics. 2022 May 13;38(10):2855-2862. doi: 10.1093/bioinformatics/btac183.