Aagesen Lone, Petersen Gitte, Seberg Ole
Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024-5192, USA.
Instituto de Botánica Darwinion, CC 22, 1642 San Isidro, Argentina.
Cladistics. 2005 Feb;21(1):15-30. doi: 10.1111/j.1096-0031.2005.00053.x.
The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character-based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length-variable fragments or data partitions were added.
在三个经验数据集里,使用不同的插入缺失处理方式,探究了两种拓扑一致性度量和四种基于特征的一致性度量的行为,每个数据集都有不同的比对难度。分析是在敏感性分析框架内使用直接优化进行的,其中插入缺失的代价是变化的。插入缺失要么被视为第五种字符状态,要么通过使用线性仿射空位代价,将连续空位串视为单个事件。当插入缺失被视为单个事件时,一致性持续提高,但没有一种一致性度量明显是更优的。然而,当合并足够多的数据时,所有一致性度量显然都倾向于选择相同的比对代价集作为最优的。一致性度量之间的分歧主要是由一个占主导的片段或一个包含数据集中所有或大部分长度变异的数据分区引起的。主导性很容易被检测到,因为当插入缺失代价增加时,基于特征的一致性度量接近其最优值。当添加新的序列长度可变片段或数据分区时,一个片段或数据分区的主导性就被克服了。