重新考虑不适用的特征，并采用逐步矩阵编码进行近似。

A reconsideration of inapplicable characters, and an approximation with step-matrix recoding.

机构信息

Unidad Ejecutora Lillo, Consejo Nacional de Investigaciones Científicas y Técnicas, Fundación Miguel Lillo, Miguel Lillo 251, San Miguel de Tucumán, 4000, Argentina.

American Museum of Natural of Natural History, New York, NY, USA.

出版信息

Cladistics. 2021 Oct;37(5):596-629. doi: 10.1111/cla.12456. Epub 2021 Apr 10.

DOI:10.1111/cla.12456

PMID:34570932

Abstract

Evidence for phylogenetic analysis comes in the form of observed similarities, and trees are selected to minimize the number of similarities that cannot be accounted for by homology (homoplasies). Thus, the classical argument for parsimony directly links homoplasy with explanatory power. When characters are hierarchically related, a first character may represent a primary structure such as tail absence/presence and a secondary (subordinate) character may describe tail colour; this makes tail colour inapplicable when tail is absent. It has been proposed that such character hierarchies should be evaluated on the same logical basis as standard characters, maximizing the number of similarities accounted for by secondary homology, i.e. common ancestry. Previous evaluations of the homology of a given ancestral reconstruction contain the unintuitive quantity "subcharacters" (number of regions of applicability). Rather than counting subcharacters, this paper proposes an equivalent but more intuitive formulation, based on counting the number of changes into each separate state. In this formulation, x-transformations, the homoplasy for the reconstruction is simply the number of changes into the state beyond the first, summed over all states. There is thus no direct connection between homoplasy and number of steps, only between homoplasy and extra steps. The link between the two formulations is that, for any region of applicability of any character, a subcharacter can be interpreted as the change into the state that is plesiomorphic in that region. Although some authors have claimed that the equivalence between maximizing explanatory power and minimizing independent originations of similar features (i.e. the standard justification of parsimony) does not hold for inapplicable characters, evaluating homoplasy with x-transformations clearly connects the two sides of that equation. Furthermore, as the evaluation with x-transformations provides a direct count and a straightforward interpretation of homoplasy, it extends naturally into implied weighting, and sheds light on problems with additive, step-matrix or continuous characters. It also allows deriving transformation costs for recoding hierarchies as step-matrix characters (where recoded states correspond to permissible combinations of states in primary and secondary characters), so that homology of the original observations is properly measured. Those transformation costs set the cost of gaining the primary structure to the maximum difference between "present" states plus cost of loss, and difference between "present" states to the sum of user-defined transformation costs between secondary features. With such recoding, invoking multiple independent derivations of the structure and similar features will cost as many extra "steps" as the instances of similarities (in both original characters) that are not being homologized. The step-matrix recoding also can take into account nested dependences. We present a simple convention for naming characters, which TNT can use to automatically convert the original data into a step-matrix form and set the proper transformation costs. Finally, the basic elements for handling inapplicable characters in the context of maximum-likelihood inference are outlined, and some quantitative comparisons between different approaches to inapplicables are provided.

摘要

系统发育分析的证据形式是观察到的相似性，树的选择是为了最大限度地减少不能用同源性（同型现象）解释的相似性数量。因此，简约性的经典论点直接将同型现象与解释力联系起来。当特征是层次相关时，第一个特征可以代表一个主要结构，例如尾巴缺失/存在，而次要（从属）特征可以描述尾巴颜色；这使得当尾巴缺失时，尾巴颜色不适用。有人提出，这种特征层次结构应该在与标准特征相同的逻辑基础上进行评估，最大限度地增加次要同源性（即共同祖先）解释的相似性数量。以前对给定祖先重建同源性的评估包含了一个直观的数量“子特征”（适用区域的数量）。本文提出了一种替代但更直观的表述方法，基于计算每个单独状态的变化次数，而不是计算子特征。在这种表述中，x 变换，即重建的同型现象，只是将第一个状态之外的状态中的变化次数相加，在所有状态中相加。因此，同型现象与步骤数量之间没有直接联系，只有与额外步骤之间的联系。这两种表述之间的联系是，对于任何特征的任何适用区域，子特征都可以解释为在该区域中原始的状态变化。尽管一些作者声称，对于不适用的特征，最大化解释力和最小化类似特征的独立起源（即简约性的标准理由）之间的等效性并不成立，但使用 x 变换评估同型现象显然将等式的两边联系起来。此外，由于使用 x 变换进行评估提供了同型现象的直接计数和直接解释，因此它自然扩展到隐含权重，并揭示了附加、步矩阵或连续特征的问题。它还允许为步矩阵字符重新编码层次结构（其中重新编码的状态对应于主要和次要字符中允许的状态组合），从而正确测量原始观察的同源性。这些转换成本将获得主要结构的成本设置为“存在”状态之间的最大差异加上损失成本，将“存在”状态之间的差异设置为次级特征之间用户定义的转换成本之和。通过这种重新编码，对结构和类似特征进行多次独立推导将花费与原始字符中不同源的相似性实例一样多的额外“步骤”。步矩阵重新编码还可以考虑嵌套的依赖性。我们提出了一种简单的命名字符的约定，TNT 可以使用该约定将原始数据自动转换为步矩阵形式，并设置适当的转换成本。最后，概述了在最大似然推理背景下处理不适用字符的基本要素，并提供了不同不适用方法之间的一些定量比较。