Suppr超能文献

一种用于纵向基因表达数据聚类的狄利克雷过程混合模型。

A Dirichlet process mixture model for clustering longitudinal gene expression data.

作者信息

Sun Jiehuan, Herazo-Maya Jose D, Kaminski Naftali, Zhao Hongyu, Warren Joshua L

机构信息

Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A.

Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520, CT, U.S.A.

出版信息

Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.

Abstract

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.

摘要

亚组识别(聚类)是生物医学研究中的一个重要问题。基因表达谱通常用于定义亚组。与仅由基线谱所捕获的信息相比,纵向基因表达谱可能会提供有关疾病进展的更多信息。因此,借助纵向基因表达数据,亚组识别可能会更加准确和有效。然而,现有的统计方法无法充分利用这些数据进行患者聚类。在本文中,我们介绍了一种基于纵向基因表达谱的贝叶斯环境下的新型聚类方法。这种方法称为BClustLonG,采用线性混合效应框架来对基因随时间的轨迹进行建模,同时基于从所有基因获得的回归系数共同进行聚类。为了考虑基因之间的相关性并缓解高维挑战,我们对回归系数采用因子分析模型。狄利克雷过程先验分布用于回归系数的均值以诱导聚类。通过广泛的模拟研究,我们表明BClustLonG比其他聚类方法具有更好的性能。当应用于严重受伤(烧伤或创伤)患者的数据集时,我们的模型能够识别出有趣的亚组。版权所有© 2017约翰威立父子有限公司。

相似文献

1
A Dirichlet process mixture model for clustering longitudinal gene expression data.
Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.
2
A Bayesian semiparametric factor analysis model for subtype identification.
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):145-158. doi: 10.1515/sagmb-2016-0051.
3
Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):615-28. doi: 10.1109/TCBB.2007.70269.
4
A mixture model with random-effects components for clustering correlated gene-expression profiles.
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
5
A Bayesian nonparametric model for classification of longitudinal profiles.
Biostatistics. 2022 Dec 12;24(1):209-225. doi: 10.1093/biostatistics/kxab026.
6
A sparse factor model for clustering high-dimensional longitudinal data.
Stat Med. 2024 Aug 30;43(19):3633-3648. doi: 10.1002/sim.10151. Epub 2024 Jun 17.
7
Meta-analysis using Dirichlet process.
Stat Methods Med Res. 2016 Feb;25(1):352-65. doi: 10.1177/0962280212453891. Epub 2012 Jul 16.
8
Bayesian infinite mixture model based clustering of gene expression profiles.
Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194.
9
Bayesian semiparametric variable selection with applications to periodontal data.
Stat Med. 2017 Jun 30;36(14):2251-2264. doi: 10.1002/sim.7255. Epub 2017 Feb 22.
10
Shape invariant mixture model for clustering non-linear longitudinal growth trajectories.
Stat Methods Med Res. 2019 Dec;28(12):3769-3784. doi: 10.1177/0962280218815301. Epub 2018 Dec 10.

引用本文的文献

1
Transcriptomic profiling of burn patients reveals key lactylation-related genes and their molecular mechanisms.
Front Med (Lausanne). 2025 Jun 27;12:1554791. doi: 10.3389/fmed.2025.1554791. eCollection 2025.
2
A Bayesian multiple imputation approach to bivariate functional data with missing components.
Stat Med. 2021 Sep 30;40(22):4772-4793. doi: 10.1002/sim.9093. Epub 2021 Jun 8.
3
Host transcriptional response to TB preventive therapy differentiates two sub-groups of IGRA-positive individuals.
Tuberculosis (Edinb). 2021 Mar;127:102033. doi: 10.1016/j.tube.2020.102033. Epub 2020 Nov 28.
4
A novel computational strategy for DNA methylation imputation using mixture regression model (MRM).
BMC Bioinformatics. 2020 Dec 1;21(1):552. doi: 10.1186/s12859-020-03865-z.
5
Factors Predicting Detrimental Change in Declarative Memory Among Women With HIV: A Study of Heterogeneity in Cognition.
Front Psychol. 2020 Oct 15;11:548521. doi: 10.3389/fpsyg.2020.548521. eCollection 2020.

本文引用的文献

1
Disentangling the heterogeneity of autism spectrum disorder through genetic findings.
Nat Rev Neurol. 2014 Feb;10(2):74-81. doi: 10.1038/nrneurol.2013.278. Epub 2014 Jan 28.
2
Tumour heterogeneity and cancer cell plasticity.
Nature. 2013 Sep 19;501(7467):328-37. doi: 10.1038/nature12624.
3
Bayesian consensus clustering.
Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.
4
Sparse Bayesian infinite factor models.
Biometrika. 2011 Jun;98(2):291-306. doi: 10.1093/biomet/asr013.
5
Personal omics profiling reveals dynamic molecular and medical phenotypes.
Cell. 2012 Mar 16;148(6):1293-307. doi: 10.1016/j.cell.2012.02.009.
6
A genomic storm in critically injured humans.
J Exp Med. 2011 Dec 19;208(13):2581-90. doi: 10.1084/jem.20111354. Epub 2011 Nov 21.
7
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.
J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. doi: 10.1198/016214508000000869.
8
Model-based clustering of microarray expression data via latent Gaussian mixture models.
Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.
9
Wavelet-based functional mixed models.
J R Stat Soc Series B Stat Methodol. 2006 Apr 1;68(2):179-199. doi: 10.1111/j.1467-9868.2006.00539.x.
10
Cluster analysis using multivariate mixed effects models.
Stat Med. 2009 Sep 10;28(20):2552-65. doi: 10.1002/sim.3632.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验