Wiens John J
Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11794-5245, USA.
Syst Biol. 2003 Aug;52(4):528-38. doi: 10.1080/10635150390218330.
The problem of missing data is often considered to be the most important obstacle in reconstructing the phylogeny of fossil taxa and in combining data from diverse characters and taxa for phylogenetic analysis. Empirical and theoretical studies show that including highly incomplete taxa can lead to multiple equally parsimonious trees, poorly resolved consensus trees, and decreased phylogenetic accuracy. However, the mechanisms that cause incomplete taxa to be problematic have remained unclear. It has been widely assumed that incomplete taxa are problematic because of the proportion or amount of missing data that they bear. In this study, I use simulations to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells. This seemingly subtle distinction has a number of important implications. First, the so-called missing data problem for incomplete taxa is, paradoxically, not directly related to their amount or proportion of missing data. Thus, the level of completeness alone should not guide the exclusion of taxa (contrary to common practice), and these results may explain why empirical studies have sometimes found little relationship between the completeness of a taxon and its impact on an analysis. These results also (1) suggest a more effective strategy for dealing with incomplete taxa, (2) call into question a justification of the controversial phylogenetic supertree approach, and (3) show the potential for the accurate phylogenetic placement of highly incomplete taxa, both when combining diverse data sets and when analyzing relationships of fossil taxa.
数据缺失问题通常被认为是重建化石类群系统发育以及整合来自不同性状和类群的数据进行系统发育分析时最重要的障碍。实证研究和理论研究表明,纳入高度不完整的类群会导致出现多个同等简约的树、分辨率低的合意树,以及系统发育准确性的降低。然而,导致不完整类群产生问题的机制仍不明确。人们普遍认为,不完整类群存在问题是因为它们所具有的缺失数据的比例或数量。在本研究中,我通过模拟表明,纳入不完整类群导致的准确性降低是由于这些类群具有的完整性状太少,而非缺失数据单元格太多。这种看似细微的区别具有许多重要意义。首先,矛盾的是,不完整类群所谓的数据缺失问题与其缺失数据的数量或比例并无直接关系。因此,仅完整性水平不应指导类群的排除(与通常做法相反),这些结果或许可以解释为什么实证研究有时发现一个类群的完整性与其对分析的影响之间几乎没有关系。这些结果还(1)提出了一种处理不完整类群的更有效策略,(2)对有争议的系统发育超树方法的一种正当理由提出质疑,以及(3)显示了在整合不同数据集以及分析化石类群关系时,对高度不完整类群进行准确系统发育定位的潜力。