Suppr超能文献

用于聚类小鼠转录因子DNA结合数据的复合层次相关贝塔混合模型

Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data.

作者信息

Dai Hongying, Charnigo Richard

机构信息

Research Development and Clinical Investigation, Children's Mercy Hospital, Kansas City, MO 64108, USA and Department of Biomedical & Health Informatics, University of Missouri-Kansas City, Kansas City, MO 64110, USA

Department of Statistics, University of Kentucky, Lexington, KY 40506, USA.

出版信息

Biostatistics. 2015 Oct;16(4):641-54. doi: 10.1093/biostatistics/kxv016. Epub 2015 May 11.

Abstract

Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, [Formula: see text], is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with [Formula: see text] brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3: , e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.

摘要

对相关结构进行建模是生物信息学中的一项挑战,尤其是在处理高通量基因组数据时。提出了一种具有可交换相关结构的复合层次相关贝塔混合模型(CBM),用于将遗传向量聚类为混合成分。相关系数[公式:见原文]在一个混合成分内是同质的,而在混合成分之间是异质的。具有[公式:见原文]的随机CBM在解释遗传变量之间的相关变化方面具有更大的灵活性。期望最大化(EM)算法和随机期望最大化(SEM)算法用于估计CBM的参数。混合成分的数量可以使用诸如AIC、BIC和ICL - BIC等模型选择标准来确定。进行了广泛的模拟研究以比较EM、SEM和模型选择标准。模拟结果表明,CBM的性能优于传统的贝塔混合模型,具有更低的估计偏差和更高的分类准确率。所提出的方法应用于对Lahdesmaki等人(2008年,《从多个数据源进行转录因子结合的概率推断》,《公共科学图书馆·综合》,第叁卷,第,e1820)生成的小鼠基因组数据中的转录因子 - DNA结合概率进行聚类。结果揭示了转录因子在与JAK - STAT、MAPK和其他两条途径中的基因启动子区域结合时的不同聚类。

相似文献

3
Epitope profiling via mixture modeling of ranked data.通过排序数据的混合模型进行表位分析。
Stat Med. 2014 Sep 20;33(21):3738-58. doi: 10.1002/sim.6224. Epub 2014 Jun 5.
5
Learning Gaussian mixture models with entropy-based criteria.使用基于熵的准则学习高斯混合模型。
IEEE Trans Neural Netw. 2009 Nov;20(11):1756-71. doi: 10.1109/TNN.2009.2030190. Epub 2009 Sep 18.
7
Applications of beta-mixture models in bioinformatics.β混合模型在生物信息学中的应用。
Bioinformatics. 2005 May 1;21(9):2118-22. doi: 10.1093/bioinformatics/bti318. Epub 2005 Feb 15.

本文引用的文献

5
Bayesian estimation of beta mixture models with variational inference.贝叶斯估计的β混合模型的变分推断。
IEEE Trans Pattern Anal Mach Intell. 2011 Nov;33(11):2160-73. doi: 10.1109/TPAMI.2011.63.
6
A Beta-mixture model for assessing genetic population structure.一种用于评估遗传群体结构的贝塔混合模型。
Biometrics. 2011 Sep;67(3):1073-82. doi: 10.1111/j.1541-0420.2010.01506.x. Epub 2010 Nov 29.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验