Département de Sciences biologiques, Université de Montréal, Montréal, Québec, Canada.
Mol Phylogenet Evol. 2011 Feb;58(2):149-56. doi: 10.1016/j.ympev.2010.11.017. Epub 2010 Dec 4.
Supermatrices are often characterized by a large amount of missing data. One possible approach to minimize such missing data is to create composite taxa. These taxa are formed by sampling sequences from different species in order to obtain a composite sequence that includes a maximum number of genes. Although this approach is increasingly used, its accuracy has rarely been tested and some authors prefer to analyze incomplete supermatrices by coding unavailable sequences as missing. To further validate the composite taxon approach, it was applied to complete mitochondrial matrices of 102 mammal species representing 93 families with varying amount of missing data. On average, missing data and composite matrices showed similar congruence to model trees obtained from the complete sequence matrix. As expected, the level of congruence to model trees decreased as missing data increased, with both approaches. We conclude that the composite taxon approach is worth considering in a phylogenomic context since it performs well and reduces computing time when compared to missing data matrices.
超矩阵通常具有大量缺失数据的特点。一种减少此类缺失数据的可能方法是创建组合分类单元。这些分类单元是通过从不同物种中采样序列来构建的,以便获得包含最多基因的组合序列。尽管这种方法越来越多地被使用,但它的准确性很少被测试,一些作者更喜欢将不可用的序列编码为缺失来分析不完整的超矩阵。为了进一步验证组合分类单元方法,将其应用于代表 93 个科的 102 种哺乳动物的完整线粒体矩阵,这些矩阵具有不同数量的缺失数据。平均而言,缺失数据和组合矩阵与从完整序列矩阵获得的模型树具有相似的一致性。正如预期的那样,随着缺失数据的增加,两种方法与模型树的一致性水平都降低了。我们得出结论,在系统基因组学背景下,组合分类单元方法是值得考虑的,因为与缺失数据矩阵相比,它的性能良好且减少了计算时间。