Suppr超能文献

使用高维转录组数据进行疾病亚型发现的结果导向贝叶斯聚类

Outcome-guided Bayesian clustering for disease subtype discovery using high-dimensional transcriptomic data.

作者信息

Meng Lingsong, Huo Zhiguang

机构信息

Department of Biostatistics, University of Florida, Gainesville, FL, USA.

出版信息

J Appl Stat. 2024 Jun 7;52(1):183-207. doi: 10.1080/02664763.2024.2362275. eCollection 2025.

Abstract

Due to the tremendous heterogeneity of disease manifestations, many complex diseases that were once thought to be single diseases are now considered to have disease subtypes. Disease subtyping analysis, that is the identification of subgroups of patients with similar characteristics, is the first step to accomplish precision medicine. With the advancement of high-throughput technologies, omics data offers unprecedented opportunity to reveal disease subtypes. As a result, unsupervised clustering analysis has been widely used for this purpose. Though promising, the subtypes obtained from traditional quantitative approaches may not always be clinically meaningful (i.e. correlate with clinical outcomes). On the other hand, the collection of rich clinical data in modern epidemiology studies has the great potential to facilitate the disease subtyping process via omics data and to discovery clinically meaningful disease subtypes. Thus, we developed an outcome-guided Bayesian clustering (GuidedBayesianClustering) method to fully integrate the clinical data and the high-dimensional omics data. A Gaussian mixed model framework was applied to perform sample clustering; a spike-and-slab prior was utilized to perform gene selection; a mixture model prior was employed to incorporate the guidance from a clinical outcome variable; and a decision framework was adopted to infer the false discovery rate of the selected genes. We deployed conjugate priors to facilitate efficient Gibbs sampling. Our proposed full Bayesian method is capable of simultaneously (i) obtaining sample clustering (disease subtype discovery); (ii) performing feature selection (select genes related to the disease subtype); and (iii) utilizing clinical outcome variable to guide the disease subtype discovery. The superior performance of the GuidedBayesianClustering was demonstrated through simulations and applications of breast cancer expression data and Alzheimer's disease. An R package has been made publicly available on GitHub to improve the applicability of our method.

摘要

由于疾病表现具有极大的异质性,许多曾经被认为是单一疾病的复杂疾病现在被认为存在疾病亚型。疾病亚型分析,即识别具有相似特征的患者亚组,是实现精准医学的第一步。随着高通量技术的发展,组学数据为揭示疾病亚型提供了前所未有的机会。因此,无监督聚类分析已被广泛用于此目的。尽管前景广阔,但从传统定量方法获得的亚型可能并不总是具有临床意义(即与临床结果相关)。另一方面,现代流行病学研究中丰富的临床数据收集有很大潜力通过组学数据促进疾病亚型分析过程,并发现具有临床意义的疾病亚型。因此,我们开发了一种结果导向的贝叶斯聚类(GuidedBayesianClustering)方法,以充分整合临床数据和高维组学数据。应用高斯混合模型框架进行样本聚类;利用尖峰和平板先验进行基因选择;采用混合模型先验纳入临床结果变量的指导;并采用决策框架推断所选基因的错误发现率。我们部署共轭先验以促进高效的吉布斯采样。我们提出的全贝叶斯方法能够同时(i)获得样本聚类(疾病亚型发现);(ii)进行特征选择(选择与疾病亚型相关的基因);以及(iii)利用临床结果变量指导疾病亚型发现。通过对乳腺癌表达数据和阿尔茨海默病的模拟和应用,证明了GuidedBayesianClustering的优越性能。一个R包已在GitHub上公开提供,以提高我们方法的适用性。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验