Suppr超能文献

基于非编码序列聚类后验分布的高效上下文相关模型构建。

Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences.

作者信息

Baele Guy, Van de Peer Yves, Vansteelandt Stijn

机构信息

Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium.

出版信息

BMC Evol Biol. 2009 Apr 30;9:87. doi: 10.1186/1471-2148-9-87.

Abstract

BACKGROUND

Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations.

RESULTS

We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies.

CONCLUSION

While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model.

摘要

背景

许多最近放宽位点独立进化假设的研究是以大幅增加替换参数数量为代价的。虽然为了对上下文依赖的进化进行建模,额外的参数是无法避免的,但只有在伴随着防止过度拟合的谨慎模型构建策略时,模型维度的大幅增加才是合理的。维度的增加会导致模型数值计算的增加、贝叶斯马尔可夫链蒙特卡罗算法收敛时间的增加以及更加繁琐的贝叶斯因子计算。

结果

我们开发了两种模型搜索算法,通过对后验密度进行聚类来减少贝叶斯因子计算的数量,以确定不同上下文中替换行为的相等性。使用贝叶斯因子评估所选模型的拟合度,我们通过模型切换热力学积分来计算该因子。为了减少计算时间并提高这种积分的精度,我们建议将计算分散到不同的计算机上,并对各个运行进行适当校准。使用所提出的策略,我们在灵长类祖先重复序列的数据集中发现,对上下文依赖进化进行仔细建模可能会显著提高模型拟合度,并且上下文依赖模型与位点间变化速率假设的结合在模型拟合方面提供了更大的改进。使用较小的核小亚基核糖体RNA数据集,我们表明只有在应用模型构建策略时,上下文依赖性才可能被检测到。

结论

虽然上下文依赖的进化模型比传统的独立进化模型能提高模型拟合度,但这种复杂模型通常会包含过多参数。因此需要为添加的参数提供合理依据,以便仅将那些对先前未考虑的进化过程进行建模的参数添加到进化模型中。为了在上下文依赖模型的参数数量与模型拟合性能之间获得最佳平衡,我们设计了两种参数减少策略,并且我们已经表明,通过减少上下文依赖进化模型中的参数数量,可以大大提高模型拟合度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcf0/2695821/1843069876db/1471-2148-9-87-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验