Department of Zoology, The Natural History Museum, Cromwell Road, London SW7 5BD, UK.
Syst Biol. 2009 Aug;58(4):425-38. doi: 10.1093/sysbio/syp043. Epub 2009 Aug 18.
In molecular phylogenetic studies, a major aspect of experimental design concerns the choice of markers and taxa. Although previous studies have investigated the phylogenetic performance of different genes and the effectiveness of increasing taxon sampling, their conclusions are partly contradictory, probably because they are highly context specific and dependent on the group of organisms used in each study. Goldman introduced a method for experimental design in phylogenetics based on the expected information to be gained that has barely been used in practice. Here we use this method to explore the phylogenetic utility of mitochondrial (mt) genes, mt genomes, and nuclear rag1 for studies of the systematics of caecilian amphibians, as well as the effect of taxon addition on the stabilization of a controversial branch of the tree. Overall phylogenetic information estimates per gene, specific estimates per branch of the tree, estimates for combined (mitogenomic) data sets, and estimates as a hypothetical new taxon is added to different parts of the caecilian tree are calculated and compared. In general, the most informative data sets are those for mt transfer and ribosomal RNA genes. Our results also show at which positions in the caecilian tree the addition of taxa have the greatest potential to increase phylogenetic information with respect to the controversial relationships of Scolecomorphus, Boulengerula, and all other teresomatan caecilians. These positions are, as intuitively expected, mostly (but not all) adjacent to the controversial branch. Generating whole mitogenomic and rag1 data for additional taxa joining the Scolecomorphus branch may be a more efficient strategy than sequencing a similar amount of additional nucleotides spread across the current caecilian taxon sampling. The methodology employed in this study allows an a priori evaluation and testable predictions of the appropriateness of particular experimental designs to solve specific questions at different levels of the caecilian phylogeny.
在分子系统发育研究中,实验设计的一个主要方面涉及到标记和分类群的选择。虽然以前的研究已经调查了不同基因的系统发育表现和增加分类群采样的有效性,但他们的结论部分是矛盾的,可能是因为它们具有高度的上下文特异性,并取决于每个研究中使用的生物体组。Goldman 提出了一种基于预期信息增益的系统发育实验设计方法,但在实践中几乎没有被使用。在这里,我们使用这种方法来探索线粒体 (mt) 基因、mt 基因组和核 rag1 在蚓螈系统发育研究中的系统发育效用,以及分类群添加对树中一个有争议分支的稳定化的影响。计算并比较了每个基因的总体系统发育信息估计值、树的每个分支的特定估计值、组合(线粒体基因组)数据集的估计值以及作为不同部分添加到蚓螈树的新分类群的假设估计值。一般来说,最具信息量的数据集是 mt 转移和核糖体 RNA 基因的数据。我们的结果还显示,在蚓螈树的哪些位置添加分类群最有可能增加关于 Scolecomorphus、Boulengerula 和所有其他 Teresomatan 蚓螈有争议关系的系统发育信息。这些位置,正如直观预期的那样,大多(但不是全部)靠近有争议的分支。为加入 Scolecomorphus 分支的额外分类群生成完整的线粒体基因组和 rag1 数据可能比在当前蚓螈分类群采样中分散测序类似数量的额外核苷酸更有效。本研究中采用的方法允许对特定实验设计的适当性进行先验评估和可测试预测,以解决蚓螈系统发育不同层次的具体问题。