Suppr超能文献

用于聚类小鼠转录因子DNA结合数据的复合层次相关贝塔混合模型

Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data.

作者信息

Dai Hongying, Charnigo Richard

机构信息

Research Development and Clinical Investigation, Children's Mercy Hospital, Kansas City, MO 64108, USA and Department of Biomedical & Health Informatics, University of Missouri-Kansas City, Kansas City, MO 64110, USA

Department of Statistics, University of Kentucky, Lexington, KY 40506, USA.

出版信息

Biostatistics. 2015 Oct;16(4):641-54. doi: 10.1093/biostatistics/kxv016. Epub 2015 May 11.

Abstract

Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, [Formula: see text], is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with [Formula: see text] brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3: , e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.

摘要

对相关结构进行建模是生物信息学中的一项挑战,尤其是在处理高通量基因组数据时。提出了一种具有可交换相关结构的复合层次相关贝塔混合模型(CBM),用于将遗传向量聚类为混合成分。相关系数[公式:见原文]在一个混合成分内是同质的,而在混合成分之间是异质的。具有[公式:见原文]的随机CBM在解释遗传变量之间的相关变化方面具有更大的灵活性。期望最大化(EM)算法和随机期望最大化(SEM)算法用于估计CBM的参数。混合成分的数量可以使用诸如AIC、BIC和ICL - BIC等模型选择标准来确定。进行了广泛的模拟研究以比较EM、SEM和模型选择标准。模拟结果表明,CBM的性能优于传统的贝塔混合模型,具有更低的估计偏差和更高的分类准确率。所提出的方法应用于对Lahdesmaki等人(2008年,《从多个数据源进行转录因子结合的概率推断》,《公共科学图书馆·综合》,第叁卷,第,e1820)生成的小鼠基因组数据中的转录因子 - DNA结合概率进行聚类。结果揭示了转录因子在与JAK - STAT、MAPK和其他两条途径中的基因启动子区域结合时的不同聚类。

相似文献

3
Epitope profiling via mixture modeling of ranked data.通过排序数据的混合模型进行表位分析。
Stat Med. 2014 Sep 20;33(21):3738-58. doi: 10.1002/sim.6224. Epub 2014 Jun 5.
5
Learning Gaussian mixture models with entropy-based criteria.使用基于熵的准则学习高斯混合模型。
IEEE Trans Neural Netw. 2009 Nov;20(11):1756-71. doi: 10.1109/TNN.2009.2030190. Epub 2009 Sep 18.
7
Applications of beta-mixture models in bioinformatics.β混合模型在生物信息学中的应用。
Bioinformatics. 2005 May 1;21(9):2118-22. doi: 10.1093/bioinformatics/bti318. Epub 2005 Feb 15.

本文引用的文献

5
Bayesian estimation of beta mixture models with variational inference.贝叶斯估计的β混合模型的变分推断。
IEEE Trans Pattern Anal Mach Intell. 2011 Nov;33(11):2160-73. doi: 10.1109/TPAMI.2011.63.
6
A Beta-mixture model for assessing genetic population structure.一种用于评估遗传群体结构的贝塔混合模型。
Biometrics. 2011 Sep;67(3):1073-82. doi: 10.1111/j.1541-0420.2010.01506.x. Epub 2010 Nov 29.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验