Suppr超能文献

一种用于对时间进程基因表达数据进行聚类的递归划分混合模型。

A recursively partitioned mixture model for clustering time-course gene expression data.

作者信息

Koestler Devin C, Marsit Carmen J, Christensen Brock C, Kelsey Karl T, Houseman E Andres

机构信息

Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160, USA.

Department of Community and Family Medicine, Section for Biostatistics and Epidemiology, Dartmouth Medical School, Hanover, New Hampshire 03756, USA ; Department of Pharmacology and Toxicology, Dartmouth College, Hanover, NH 03756, USA.

出版信息

Transl Cancer Res. 2014;3(3):217-232. doi: 10.3978/j.issn.2218-676X.2014.06.04.

Abstract

BACKGROUND

Longitudinally collected gene expression data provides an opportunity to investigate the dynamic behavior of gene expression and is crucial for establishing causal links between changes on a molecular level and disease development and progression. In terms of the analysis of such data, clustering of subjects based on time-course expression data may improve our understanding of temporal expression patterns that result in disease phenotypes. Although there are numerous existing methods for clustering subjects using gene expression data, most are not suitable when expression measurements are repeatedly collected over a time-course.

METHODS

We present a modified version of the recursively partitioned mixture model (RPMM) for clustering subjects based on longitudinally collected gene expression data. In the proposed time-course RPMM (TC-RPMM), subjects are clustered on the basis of their temporal profiles of gene expression using a mixture of mixed effects models framework. This framework captures changes in gene expression over time and models the autocorrelation between repeated gene expression measurements for the same subject. We assessed the performance of TC-RPMM using extensive simulation studies and a dataset from a multi-center research study of inflammation and response to injury (www.gluegrant.org), which consisted of time-course gene expression data for 140 subjects.

RESULTS

Our simulation studies encompassed several different scenarios and were aimed at assessing the ability of TC-RPMM to correctly recover true class memberships when the expression trajectories that characterized those classes differed. Overall, our simulation studies revealed favorable performance of TC-RPMM compared to competing approaches, however clustering performance was observed to be highly dependent on the proportion of class discriminating genes used in clustering analysis. When applied to real epidemiologic data with repeated-measures, longitudinal gene expression measurements, TC-RPMM identified clusters that had strong biological and clinical significance.

CONCLUSIONS

Methods for clustering subjects based on temporal gene expression profiles is a high priority for molecular biology and bioinformatics research. Along these lines, the proposed TC-RPMM represents a promising new approach for analyzing time-course gene expression data.

摘要

背景

纵向收集的基因表达数据为研究基因表达的动态行为提供了契机,对于在分子水平上的变化与疾病发展及进展之间建立因果联系至关重要。就此类数据分析而言,基于时间进程表达数据对研究对象进行聚类,可能会增进我们对导致疾病表型的时间表达模式的理解。尽管现有众多利用基因表达数据对研究对象进行聚类的方法,但当在一个时间进程中反复收集表达测量值时,大多数方法并不适用。

方法

我们提出了递归划分混合模型(RPMM)的一个修改版本,用于基于纵向收集的基因表达数据对研究对象进行聚类。在所提出的时间进程RPMM(TC-RPMM)中,利用混合效应模型框架,根据研究对象基因表达的时间概况对其进行聚类。该框架捕捉基因表达随时间的变化,并对同一研究对象重复基因表达测量值之间的自相关性进行建模。我们通过广泛的模拟研究以及来自一项关于炎症和损伤反应的多中心研究(www.gluegrant.org)的数据集评估了TC-RPMM的性能,该数据集包含140名研究对象的时间进程基因表达数据。

结果

我们的模拟研究涵盖了几种不同的情况,旨在评估当表征这些类别的表达轨迹不同时,TC-RPMM正确恢复真实类别归属的能力。总体而言,我们的模拟研究表明,与竞争方法相比,TC-RPMM具有良好的性能,然而观察到聚类性能高度依赖于聚类分析中使用的类别区分基因的比例。当应用于具有重复测量的纵向基因表达测量的实际流行病学数据时,TC-RPMM识别出具有很强生物学和临床意义的聚类。

结论

基于时间基因表达概况对研究对象进行聚类的方法是分子生物学和生物信息学研究的高度优先事项。就此而言,所提出的TC-RPMM代表了一种分析时间进程基因表达数据的有前景的新方法。

相似文献

1
A recursively partitioned mixture model for clustering time-course gene expression data.
Transl Cancer Res. 2014;3(3):217-232. doi: 10.3978/j.issn.2218-676X.2014.06.04.
2
Semi-supervised recursively partitioned mixture models for identifying cancer subtypes.
Bioinformatics. 2010 Oct 15;26(20):2578-85. doi: 10.1093/bioinformatics/btq470. Epub 2010 Aug 16.
4
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Math Biosci. 2016 Nov;281:74-81. doi: 10.1016/j.mbs.2016.08.012. Epub 2016 Sep 10.
5
A mixture model with random-effects components for clustering correlated gene-expression profiles.
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
7
Finding explained groups of time-course gene expression profiles with predictive clustering trees.
Mol Biosyst. 2010 Apr;6(4):729-40. doi: 10.1039/b913690h. Epub 2010 Feb 19.
8
Application of dynamic topic models to toxicogenomics data.
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):368. doi: 10.1186/s12859-016-1225-0.
9
Dynamic model-based clustering for time-course gene expression data.
J Bioinform Comput Biol. 2005 Aug;3(4):821-36. doi: 10.1142/s0219720005001314.

引用本文的文献

3
A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data.
PLoS One. 2015 Aug 27;10(8):e0134540. doi: 10.1371/journal.pone.0134540. eCollection 2015.

本文引用的文献

1
Placenta-imprinted gene expression association of infant neurobehavior.
J Pediatr. 2012 May;160(5):854-860.e2. doi: 10.1016/j.jpeds.2011.10.028. Epub 2011 Dec 6.
2
DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer.
J Clin Oncol. 2011 Mar 20;29(9):1133-9. doi: 10.1200/JCO.2010.31.3577. Epub 2011 Feb 22.
3
Identification and interpretation of longitudinal gene expression changes in trauma.
PLoS One. 2010 Dec 20;5(12):e14380. doi: 10.1371/journal.pone.0014380.
4
DNA methylation, isocitrate dehydrogenase mutation, and survival in glioma.
J Natl Cancer Inst. 2011 Jan 19;103(2):143-53. doi: 10.1093/jnci/djq497. Epub 2010 Dec 16.
5
Predicting patient survival from longitudinal gene expression.
Stat Appl Genet Mol Biol. 2010;9(1):Article41. doi: 10.2202/1544-6115.1617. Epub 2010 Nov 22.
6
Identification of VKORC1 interaction partners by split-ubiquitin system and coimmunoprecipitation.
Thromb Haemost. 2011 Feb;105(2):285-94. doi: 10.1160/TH10-07-0483. Epub 2010 Nov 23.
7
Semi-supervised recursively partitioned mixture models for identifying cancer subtypes.
Bioinformatics. 2010 Oct 15;26(20):2578-85. doi: 10.1093/bioinformatics/btq470. Epub 2010 Aug 16.
8
A framework for feature selection in clustering.
J Am Stat Assoc. 2010 Jun 1;105(490):713-726. doi: 10.1198/jasa.2010.tm09415.
9
Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects.
Bioinformatics. 2010 Feb 1;26(3):370-7. doi: 10.1093/bioinformatics/btp686. Epub 2009 Dec 29.
10
A genomic score prognostic of outcome in trauma patients.
Mol Med. 2009 Jul-Aug;15(7-8):220-7. doi: 10.2119/molmed.2009.00027. Epub 2009 Apr 10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验