Sun Jiehuan, Herazo-Maya Jose D, Kaminski Naftali, Zhao Hongyu, Warren Joshua L
Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A.
Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520, CT, U.S.A.
Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.
Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.
亚组识别(聚类)是生物医学研究中的一个重要问题。基因表达谱通常用于定义亚组。与仅由基线谱所捕获的信息相比,纵向基因表达谱可能会提供有关疾病进展的更多信息。因此,借助纵向基因表达数据,亚组识别可能会更加准确和有效。然而,现有的统计方法无法充分利用这些数据进行患者聚类。在本文中,我们介绍了一种基于纵向基因表达谱的贝叶斯环境下的新型聚类方法。这种方法称为BClustLonG,采用线性混合效应框架来对基因随时间的轨迹进行建模,同时基于从所有基因获得的回归系数共同进行聚类。为了考虑基因之间的相关性并缓解高维挑战,我们对回归系数采用因子分析模型。狄利克雷过程先验分布用于回归系数的均值以诱导聚类。通过广泛的模拟研究,我们表明BClustLonG比其他聚类方法具有更好的性能。当应用于严重受伤(烧伤或创伤)患者的数据集时,我们的模型能够识别出有趣的亚组。版权所有© 2017约翰威立父子有限公司。