Bolus Herbarium, Department of Biological Sciences, University of Cape Town, Cape Town, Western Cape, 7700, South Africa; and Department of Palaeobiology, Swedish Museum of Natural History, P.O. Box 50007, 104 05 Stockholm, Sweden.
Syst Biol. 2014 Jan 1;63(1):1-16. doi: 10.1093/sysbio/syt052. Epub 2013 Aug 11.
Nuclear DNA is widely used to estimate phylogenetic and phylogeographic relationships. Nuclear gene variants may be present in an individual's genome, and these result in Intra-Individual Site Polymorphisms (2ISP; pronounced "twisp") in direct-PCR or individual-consensus sequences based on a sample of clones or fragment sequences from next generation sequencing (NGS). 2ISPs can occur fairly often, especially within, but not restricted to, high-copy-number regions such as the widely used internal transcribed spacers of the nuclear ribosomal cistron. Dealing with 2ISPs has been problematic as phylogeny reconstruction optimality criteria generally do not take account of this variation. Here we test whether an approach that treats 2ISPs as additional (termed "informative"), rather than ambiguous, characters offers improved support in three common criteria used for phylogenetic inference: Minimum Evolution (via Neighbour Joining), Maximum Parsimony, and Maximum Likelihood. We demonstrate significant improvements using the 2ISP-informative treatment with simulated, real-world, and case-study data sets. We envisage that this 2ISP-informative approach will greatly aid phylogenetic inference using any nuclear DNA regions that contain polymorphisms within individuals (including consensus sequences generated from NGS), especially at the intrageneric or intraspecific level.
核 DNA 广泛用于估计系统发育和系统地理关系。核基因变体可能存在于个体的基因组中,这些变体导致直接 PCR 或基于克隆或片段序列的下一代测序 (NGS) 样本的个体共识序列中的个体内部位点多态性 (2ISP;发音为“twisp”)。2ISPs 可能相当常见,尤其是在高拷贝数区域内,但不仅限于此,例如广泛使用的核核糖体基因内转录间隔区。处理 2ISPs 一直存在问题,因为系统发育重建最优标准通常不考虑这种变异。在这里,我们测试了一种将 2ISPs 视为额外(称为“信息性”)而不是模棱两可的字符的方法,是否在用于系统发育推断的三个常用标准中提供了更好的支持:最小进化(通过邻接法)、最大简约法和最大似然法。我们使用模拟、真实世界和案例研究数据集证明了这种 2ISP 信息性处理的显著改进。我们设想,这种 2ISP 信息性方法将极大地帮助使用任何包含个体内多态性的核 DNA 区域(包括来自 NGS 的共识序列)进行系统发育推断,特别是在属内或种内水平。