Suppr超能文献

加速估计局地异质轮廓混合模型中的频率类。

Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models.

机构信息

Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada.

École Nationale Supérieure de Techniques Avancées, Palaiseau, France.

出版信息

Mol Biol Evol. 2018 May 1;35(5):1266-1283. doi: 10.1093/molbev/msy026.

Abstract

As a consequence of structural and functional constraints, proteins tend to have site-specific preferences for particular amino acids. Failing to adjust for heterogeneity of frequencies over sites can lead to artifacts in phylogenetic estimation. Site-heterogeneous mixture-models have been developed to address this problem. However, due to prohibitive computational times, maximum likelihood implementations utilize fixed component frequency vectors inferred from sequences in a database that are external to the alignment under analysis. Here, we propose a composite likelihood approach to estimation of component frequencies for a mixture model that directly uses the data from the alignment of interest. In the common case that the number of taxa under study is not large, several adjustments to the default composite likelihood are shown to be necessary. In simulations, the approach is shown to provide large improvements over hierarchical clustering. For empirical data, substantial improvements in likelihoods are found over mixtures using fixed components.

摘要

由于结构和功能的限制,蛋白质往往对特定的氨基酸具有特定的偏好。如果不考虑位置上频率的异质性,可能会导致系统发育估计中的假象。已经开发了基于位置异质的混合模型来解决这个问题。然而,由于计算时间过长,最大似然实现利用了从分析比对之外的数据库序列中推断出的固定分量频率向量。在这里,我们提出了一种混合模型的分量频率估计的复合似然方法,该方法直接使用感兴趣的比对数据。在研究的分类单元数量不大的常见情况下,需要对默认的复合似然进行几种调整。在模拟中,该方法在聚类方面优于层次聚类。对于实际数据,与使用固定分量的混合物相比,似然度有了很大的提高。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验