Suppr超能文献

多态分类单元、缺失值与分支分析

POLYMORPHIC TAXA, MISSING VALUES AND CLADISTIC ANALYSIS.

作者信息

Nixon Kevin C, Davis Jerrold I

机构信息

L. H. Bailey Hortorium, Cornell University, Ithaca, New York 14853, U.S.A.

出版信息

Cladistics. 1991 Sep;7(3):233-241. doi: 10.1111/j.1096-0031.1991.tb00036.x.

Abstract

Missing values have been used in cladistic analyses when data are unavailable, inapplicable or sometimes when character states are variable within terminal taxa. The practice of scoring taxa as having "missing values" for polymorphic characters introduces errors into the calculation of cladogram lengths and consistency indices because some character change is hidden within terminals. Because these hidden character steps are not counted, the set of most parsimonious cladograms may differ from those that would be found if polymorphic taxa had been broken into monomorphic subunits. In some cases, the trees found when polymorphisms are scored as missing values may not include any of the most parsimonious trees found when the data are scored properly. Additionally, in some cases, polymorphic taxa may be found to be polyphyletic when broken into monomorphic subunits; this is undetected when polymorphisms are treated as missing. Because of these problems, terminal units in cladistic analysis should be based on unique, fixed combinations of characters. Polymorphic taxa should be subdivided into subunits that are monomorphic for each character used in the analysis. Disregarding errors in topology, the additional hidden steps in a cladogram in which polymorphisms are scored as missing can be calculated by a simple formula, based on the observation that if it is assumed that polymorphic terminals include all combinations of character states, 2 - 1 additional steps are required for each taxon in which p polymorphic binary characters are scored as missing values. Thus, when several polymorphisms are scored as missing in the same taxon, very large errors can be introduced into the calculation of tree length.

摘要

当数据不可用、不适用或有时当终端分类单元内的性状状态可变时,缺失值已被用于分支分析中。将分类单元对多态性状计为“缺失值”的做法会在分支图长度和一致性指数的计算中引入误差,因为一些性状变化隐藏在终端内。由于这些隐藏的性状步骤未被计算在内,最简约分支图的集合可能与将多态分类单元分解为单态亚单元时所得到的不同。在某些情况下,将多态性计为缺失值时所找到的树可能不包括数据正确计分情况下所找到的任何最简约树。此外,在某些情况下,多态分类单元在分解为单态亚单元时可能被发现是多系的;而当将多态性视为缺失时,这是检测不到的。由于这些问题,分支分析中的终端单元应基于独特、固定的性状组合。多态分类单元应细分为对于分析中使用的每个性状为单态的亚单元。忽略拓扑结构中的误差,在将多态性计为缺失的分支图中的额外隐藏步骤可以通过一个简单公式计算,基于这样的观察:如果假设多态终端包括性状状态的所有组合,对于每个将p个多态二元性状计为缺失值的分类单元,需要额外2 - 1个步骤。因此,当在同一分类单元中有多个多态性被计为缺失时,会在树长的计算中引入非常大的误差。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验