Majoros William H, Ohler Uwe
Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA.
Bioinformatics. 2009 Jan 15;25(2):175-82. doi: 10.1093/bioinformatics/btn598. Epub 2008 Nov 18.
The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on computational complexity and generalization ability of the resulting higher order models invites the question of whether simpler approaches to context modeling might permit appreciable reductions in model complexity and computational cost, without sacrificing prediction accuracy.
We formulate several alternative methods for context modeling based on windowed Bayesian networks, and compare their effects on both accuracy and computational complexity for the task of discriminating functionally distinct segments in vertebrate DNA. Our results show that substantial reductions in the complexity of both the model and the associated inference algorithm can be achieved without reducing predictive accuracy.
基因组DNA中保守模式的建模在许多生物信息学应用中越来越受欢迎。虽然迄今为止开发的几个系统在其替换模型中纳入了上下文依赖性,但由此产生的高阶模型对计算复杂性和泛化能力的影响引发了一个问题,即更简单的上下文建模方法是否可以在不牺牲预测准确性的情况下显著降低模型复杂性和计算成本。
我们基于窗口贝叶斯网络制定了几种上下文建模的替代方法,并比较了它们对区分脊椎动物DNA中功能不同片段任务的准确性和计算复杂性的影响。我们的结果表明,在不降低预测准确性的情况下,可以实现模型和相关推理算法复杂性的大幅降低。