Warnow Tandy
Professor of Computer Science, Department of Computer Science, University of Texas at Austin.
PLoS Curr. 2012 Mar 9;4:RRN1308. doi: 10.1371/currents.RRN1308.
BackgroundMost statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigated.ResultsWe prove that maximum likelihood phylogeny estimation, treating indels as missing data, can be statistically inconsistent for a general (and rather simple) model of sequence evolution, even when given the true alignment. Therefore, accurate phylogeny estimation cannot be guaranteed for maximum likelihood analyses, even given arbitrarily long sequences, when indels are present and treated as missing data.ConclusionsOur result shows that the standard statistical techniques used to estimate phylogenies from sequence alignments may have unfavorable statistical properties, even when the sequence alignment is accurate and the assumed substitution model matches the generation model. This suggests that the recent research focus on developing statistical methods that treat indel events properly is an important direction for phylogeny estimation.
背景
当今使用的大多数系统发育估计统计方法将输入序列比对中的空位(通常代表插入或缺失,即插入缺失)视为缺失数据。然而,这种对插入缺失的处理方式的统计特性尚未得到充分研究。
结果
我们证明,对于一般(且相当简单)的序列进化模型,即使给定真实比对,将插入缺失视为缺失数据的最大似然系统发育估计在统计上可能是不一致的。因此,当存在插入缺失并将其视为缺失数据时,即使给定任意长的序列,最大似然分析也无法保证准确的系统发育估计。
结论
我们的结果表明,用于从序列比对估计系统发育的标准统计技术可能具有不利的统计特性,即使序列比对准确且假设的替换模型与生成模型匹配。这表明最近专注于开发能正确处理插入缺失事件的统计方法的研究方向对于系统发育估计很重要。