Suppr超能文献

基于似然法的系统发育分析方法中的无星号偏差和参数估计偏差。

Starless bias and parameter-estimation bias in the likelihood-based phylogenetic method.

作者信息

Xia Xuhua

机构信息

Department of Biology, University of Ottawa, Ottawa, Canada, K1N 6N5.

Ottawa Institute of Systems Biology, Ottawa, Canada, K1H 8M5.

出版信息

AIMS Genet. 2019 Apr 9;5(4):212-223. doi: 10.3934/genet.2018.4.212. eCollection 2018.

Abstract

I analyzed various site pattern combinations in a 4-OTU case to identify sources of starless bias and parameter-estimation bias in likelihood-based phylogenetic methods, and reported three significant contributions. First, the likelihood method is counterintuitive in that it may not generate a star tree with sequences that are equidistant from each other. This behaviour, dubbed starless bias, happens in a 4-OTU tree when there is an excess (i.e., more than expected from a star tree and a substitution model) of conflicting phylogenetic signals supporting the three resolved topologies equally. Special site pattern combinations leading to rejection of a star tree, when sequences are equidistant from each other, were identified. Second, fitting gamma distribution to model rate heterogeneity over sites is strongly confounded with tree topology, especially in conjunction with the starless bias. I present examples to show dramatic differences in the estimated shape parameter α between a star tree and a resolved tree. There may be no rate heterogeneity over sites (with the estimated α > 10000) when a star tree is imposed, but α < 1 (suggesting strong rate heterogeneity over sites) when an (incorrect) resolved tree is imposed. Thus, the dependence of "rate heterogeneity" on tree topology implies that "rate heterogeneity" is not a sequence-specific feature, cautioning against interpreting a small α to mean that some sites are under strong purifying selection and others not. Thirdly, because there is no existing (and working) likelihood method for evaluating a star tree with continuous gamma-distributed rate, I have implemented the method for JC69 in a self-contained R script for a four-OTU tree (star or resolved), in addition to another R script assuming a constant rate over sites. These R scripts should be useful for teaching and exploring likelihood methods in phylogenetics.

摘要

我分析了一个包含4个OTU的案例中的各种位点模式组合,以识别基于似然法的系统发育方法中无星型偏差和参数估计偏差的来源,并报告了三项重要发现。首先,似然法违反直觉,因为它可能不会用彼此距离相等的序列生成星型树。这种行为被称为无星型偏差,当支持三种解析拓扑结构的冲突系统发育信号过多(即,比星型树和替换模型预期的更多)时,在4个OTU的树中就会出现这种情况。当序列彼此距离相等时,识别出了导致星型树被拒绝的特殊位点模式组合。其次,用伽马分布来模拟位点间的速率异质性与树拓扑结构密切相关,特别是与无星型偏差相结合时。我给出了例子来说明星型树和解析树之间估计的形状参数α存在显著差异。当强加星型树时,位点间可能不存在速率异质性(估计的α>10000),但当强加一个(错误的)解析树时,α<1(表明位点间存在强烈的速率异质性)。因此,“速率异质性”对树拓扑结构的依赖性意味着“速率异质性”不是序列特异性特征,这提醒我们不要将小的α解释为某些位点受到强烈的纯化选择而其他位点没有。第三,由于目前没有用于评估具有连续伽马分布速率的星型树的似然法,我除了编写了另一个假设位点上速率恒定的R脚本外,还在一个自包含的R脚本中为四OTU树(星型或解析型)实现了JC69方法。这些R脚本对于在系统发育学中教授和探索似然法应该是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4acc/6690233/15dee3fce51d/genetics-05-04-212-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验