Division of Paleontology, American Museum of Natural History, New York, NY, USA
Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA.
Proc Biol Sci. 2018 Nov 28;285(1892):20181784. doi: 10.1098/rspb.2018.1784.
The use of discrete character data for disparity analyses has become more popular, partially due to the recognition that character data describe variation at large taxonomic scales, as well as the increasing availability of both character matrices co-opted from phylogenetic analysis and software tools. As taxonomic scope increases, the need to describe variation leads to some characters that may describe traits not found across all the taxa. In such situations, it is common practice to treat inapplicable characters as missing data when calculating dissimilarity matrices for disparity studies. For commonly used dissimilarity metrics like Wills's GED and Gower's coefficient, this can lead to the reranking of pairwise dissimilarities, resulting in taxa that share more primary character states being assigned larger dissimilarity values than taxa that share fewer. We introduce a family of metrics that proportionally weight primary characters according to the secondary characters that describe them, effectively eliminating this problem, and compare their performance to common dissimilarity metrics and previously proposed weighting schemes. When applied to empirical datasets, we confirm that choice of dissimilarity metric frequently affects the rank order of pairwise distances, differentially influencing downstream macroevolutionary inferences.
离散字符数据在视差分析中的应用变得越来越流行,部分原因是人们认识到字符数据可以描述较大的分类尺度上的变异,而且字符矩阵越来越多地来自系统发育分析和软件工具。随着分类范围的扩大,描述变异的需要导致一些字符可能描述了所有类群中都没有的特征。在这种情况下,当为视差研究计算不相似性矩阵时,通常将不可应用的字符视为缺失数据。对于常用的不相似性度量,如威尔斯的 GED 和高尔系数,这可能导致成对不相似性的重新排序,导致共享更多主要特征状态的分类单元被分配更大的不相似性值,而共享较少主要特征状态的分类单元则被分配更小的不相似性值。我们引入了一组度量标准,根据描述它们的次要特征对主要特征进行比例加权,有效地解决了这个问题,并将其性能与常用的不相似性度量标准和先前提出的加权方案进行了比较。当应用于经验数据集时,我们证实不相似性度量标准的选择经常影响成对距离的排序,从而对下游的宏观进化推断产生不同的影响。