一种用于纵向基因表达数据聚类的狄利克雷过程混合模型。

A Dirichlet process mixture model for clustering longitudinal gene expression data.

作者信息

Sun Jiehuan, Herazo-Maya Jose D, Kaminski Naftali, Zhao Hongyu, Warren Joshua L

机构信息

Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A.

Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520, CT, U.S.A.

出版信息

Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.

DOI:10.1002/sim.7374

PMID:28620908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5583037/

Abstract

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.

摘要

亚组识别（聚类）是生物医学研究中的一个重要问题。基因表达谱通常用于定义亚组。与仅由基线谱所捕获的信息相比，纵向基因表达谱可能会提供有关疾病进展的更多信息。因此，借助纵向基因表达数据，亚组识别可能会更加准确和有效。然而，现有的统计方法无法充分利用这些数据进行患者聚类。在本文中，我们介绍了一种基于纵向基因表达谱的贝叶斯环境下的新型聚类方法。这种方法称为BClustLonG，采用线性混合效应框架来对基因随时间的轨迹进行建模，同时基于从所有基因获得的回归系数共同进行聚类。为了考虑基因之间的相关性并缓解高维挑战，我们对回归系数采用因子分析模型。狄利克雷过程先验分布用于回归系数的均值以诱导聚类。通过广泛的模拟研究，我们表明BClustLonG比其他聚类方法具有更好的性能。当应用于严重受伤（烧伤或创伤）患者的数据集时，我们的模型能够识别出有趣的亚组。版权所有© 2017约翰威立父子有限公司。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ef9/5583037/1c04e221797d/nihms880431f1.jpg

相似文献

A Dirichlet process mixture model for clustering longitudinal gene expression data.一种用于纵向基因表达数据聚类的狄利克雷过程混合模型。

Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.

A Bayesian semiparametric factor analysis model for subtype identification.用于亚型识别的贝叶斯半参数因子分析模型

Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):145-158. doi: 10.1515/sagmb-2016-0051.

Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.使用狄利克雷过程混合模型对基因表达聚类中的不确定性进行建模和可视化。

IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):615-28. doi: 10.1109/TCBB.2007.70269.

A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

A Bayesian nonparametric model for classification of longitudinal profiles.一种用于纵向轮廓分类的贝叶斯非参数模型。

Biostatistics. 2022 Dec 12;24(1):209-225. doi: 10.1093/biostatistics/kxab026.

A sparse factor model for clustering high-dimensional longitudinal data.一种用于聚类高维纵向数据的稀疏因子模型。

Stat Med. 2024 Aug 30;43(19):3633-3648. doi: 10.1002/sim.10151. Epub 2024 Jun 17.

Meta-analysis using Dirichlet process.使用狄利克雷过程的荟萃分析。

Stat Methods Med Res. 2016 Feb;25(1):352-65. doi: 10.1177/0962280212453891. Epub 2012 Jul 16.

Bayesian infinite mixture model based clustering of gene expression profiles.基于贝叶斯无限混合模型的基因表达谱聚类

Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194.

Bayesian semiparametric variable selection with applications to periodontal data.贝叶斯半参数变量选择及其在牙周数据中的应用

Stat Med. 2017 Jun 30;36(14):2251-2264. doi: 10.1002/sim.7255. Epub 2017 Feb 22.

Shape invariant mixture model for clustering non-linear longitudinal growth trajectories.形状不变混合模型在聚类非线性纵向增长轨迹中的应用。

Stat Methods Med Res. 2019 Dec;28(12):3769-3784. doi: 10.1177/0962280218815301. Epub 2018 Dec 10.

引用本文的文献

Transcriptomic profiling of burn patients reveals key lactylation-related genes and their molecular mechanisms.烧伤患者的转录组分析揭示关键的乳酸化相关基因及其分子机制。

Front Med (Lausanne). 2025 Jun 27;12:1554791. doi: 10.3389/fmed.2025.1554791. eCollection 2025.

A Bayesian multiple imputation approach to bivariate functional data with missing components.一种贝叶斯多元插补方法，用于处理具有缺失分量的双变量函数数据。

Stat Med. 2021 Sep 30;40(22):4772-4793. doi: 10.1002/sim.9093. Epub 2021 Jun 8.

Host transcriptional response to TB preventive therapy differentiates two sub-groups of IGRA-positive individuals.宿主对结核预防性治疗的转录反应可区分 IGRA 阳性个体的两个亚群。

Tuberculosis (Edinb). 2021 Mar;127:102033. doi: 10.1016/j.tube.2020.102033. Epub 2020 Nov 28.

A novel computational strategy for DNA methylation imputation using mixture regression model (MRM).一种基于混合回归模型（MRM）的新型 DNA 甲基化推断计算策略。

BMC Bioinformatics. 2020 Dec 1;21(1):552. doi: 10.1186/s12859-020-03865-z.

Factors Predicting Detrimental Change in Declarative Memory Among Women With HIV: A Study of Heterogeneity in Cognition.预测感染HIV女性陈述性记忆有害变化的因素：一项认知异质性研究

Front Psychol. 2020 Oct 15;11:548521. doi: 10.3389/fpsyg.2020.548521. eCollection 2020.

本文引用的文献

Disentangling the heterogeneity of autism spectrum disorder through genetic findings.通过遗传发现解析自闭症谱系障碍的异质性。

Nat Rev Neurol. 2014 Feb;10(2):74-81. doi: 10.1038/nrneurol.2013.278. Epub 2014 Jan 28.

Tumour heterogeneity and cancer cell plasticity.肿瘤异质性和癌细胞可塑性。

Nature. 2013 Sep 19;501(7467):328-37. doi: 10.1038/nature12624.

Bayesian consensus clustering.贝叶斯共识聚类。

Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.

Sparse Bayesian infinite factor models.稀疏贝叶斯无限因子模型

Biometrika. 2011 Jun;98(2):291-306. doi: 10.1093/biomet/asr013.

Personal omics profiling reveals dynamic molecular and medical phenotypes.个人组学分析揭示动态的分子和医学表型。

Cell. 2012 Mar 16;148(6):1293-307. doi: 10.1016/j.cell.2012.02.009.

A genomic storm in critically injured humans.危重症患者的基因组风暴。

J Exp Med. 2011 Dec 19;208(13):2581-90. doi: 10.1084/jem.20111354. Epub 2011 Nov 21.

High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.高维稀疏因子建模：在基因表达基因组学中的应用

J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. doi: 10.1198/016214508000000869.

Model-based clustering of microarray expression data via latent Gaussian mixture models.基于潜在高斯混合模型的微阵列表达数据的模型聚类。

Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.

Wavelet-based functional mixed models.基于小波的功能混合模型。

J R Stat Soc Series B Stat Methodol. 2006 Apr 1;68(2):179-199. doi: 10.1111/j.1467-9868.2006.00539.x.

Cluster analysis using multivariate mixed effects models.使用多元混合效应模型的聚类分析。

Stat Med. 2009 Sep 10;28(20):2552-65. doi: 10.1002/sim.3632.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于纵向基因表达数据聚类的狄利克雷过程混合模型。

A Dirichlet process mixture model for clustering longitudinal gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献