Wiens John J
Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA.
J Biomed Inform. 2006 Feb;39(1):34-42. doi: 10.1016/j.jbi.2005.04.001.
Concerns about the deleterious effects of missing data may often determine which characters and taxa are included in phylogenetic analyses. For example, researchers may exclude taxa lacking data for some genes or exclude a gene lacking data in some taxa. Yet, there may be very little evidence to support these decisions. In this paper, I review the effects of missing data on phylogenetic analyses. Recent simulations suggest that highly incomplete taxa can be accurately placed in phylogenies, as long as many characters have been sampled overall. Furthermore, adding incomplete taxa can dramatically improve results in some cases by subdividing misleading long branches. Adding characters with missing data can also improve accuracy, although there is a risk of long-branch attraction in some cases. Consideration of how missing data does (or does not) affect phylogenetic analyses may allow researchers to design studies that can reconstruct large phylogenies quickly, economically, and accurately.
对缺失数据有害影响的担忧常常会决定哪些性状和分类单元会被纳入系统发育分析。例如,研究人员可能会排除某些基因缺乏数据的分类单元,或者排除某些分类单元中缺乏数据的基因。然而,可能几乎没有证据支持这些决定。在本文中,我回顾了缺失数据对系统发育分析的影响。最近的模拟表明,只要总体上对许多性状进行了采样,高度不完整的分类单元也能被准确地置于系统发育树中。此外,在某些情况下,添加不完整的分类单元可以通过细分误导性的长分支显著改善结果。添加有缺失数据的性状也可以提高准确性,尽管在某些情况下存在长枝吸引的风险。考虑缺失数据如何(或不如何)影响系统发育分析,可能会使研究人员设计出能够快速、经济且准确地重建大型系统发育树的研究。