Suppr超能文献

贝叶斯医学博士:用于基序发现的灵活生物学建模。

BayesMD: flexible biological modeling for motif discovery.

作者信息

Tang Man-Hung Eric, Krogh Anders, Winther Ole

机构信息

Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Copenhagen, Denmark.

出版信息

J Comput Biol. 2008 Dec;15(10):1347-63. doi: 10.1089/cmb.2007.0176.

Abstract

We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.

摘要

我们展示了BayesMD,这是一个具有若干新特性的贝叶斯基序发现模型。三种不同类型的生物学先验知识以模块化方式构建到该框架中。狄利克雷混合分布被用作结合位点中核苷酸概率的先验分布。它在转录因子(TF)数据库上进行训练,以提取TF结合位点的典型特性。以类似的方式,我们为背景序列训练特定生物体的先验分布。最后,我们对结合位点的位置使用一种先验分布。这种先验分布代表了与来自保守性、局部序列复杂性、核小体占据率等的基序和背景先验分布以及关于出现次数的假设互补的信息。贝叶斯推断使用精确边际化(多项参数)和采样(在位点位置上)相结合的方法进行。使用先进的采样方法并行回火实现了稳健的采样结果。在分析后步骤中,通过在那些包含频繁出现位点的基序中搜索,找到具有高边际概率的候选基序。由此,避免了对基序的最大后验推断,并且边际概率可直接用于评估发现的显著性。该框架在多个真实和人工数据集上与其他方法进行了基准测试。可从http://bayesmd.binf.ku.dk/获取随附的预测服务器、文档、软件、模型和数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验