Department of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1.
Mol Phylogenet Evol. 2012 Nov;65(2):765-73. doi: 10.1016/j.ympev.2012.07.033. Epub 2012 Aug 11.
Barcoding is an initiative to define a standard fragment of DNA to be used to assign sequences of unknown origin to existing known species whose sequences are recorded in databases. This is a difficult task when species are closely related and individuals of these species might have more than one origin. Using a previously introduced Bayesian statistical tree-less assignment algorithm based on segregating sites, we examine how it functions in the presence of hidden population subdivision with closely related species using simulations. Not surprisingly, adding samples to the database from a greater proportion of the species range leads to a consistently higher number of accurate results. Without such samples, query sequences that originate from outside of the sampled range are easily misinterpreted as coming from other species. However, we show that even the addition of a single sample from a different subpopulation is sufficient to greatly increase the probability of placement of unknown queries into the correct species group. This study highlights the importance of broad sampling, even with five reference samples per species, in the creation of a reference database.
条形码是一种定义标准 DNA 片段的计划,用于将未知来源的序列分配给已知物种,这些物种的序列记录在数据库中。当物种密切相关且这些物种的个体可能有多个起源时,这是一项艰巨的任务。我们使用先前介绍的基于分离位点的贝叶斯无树分配算法,通过模拟来研究在存在密切相关物种的隐藏种群细分时它的功能。毫不奇怪,从更大比例的物种范围向数据库中添加样本会导致更准确的结果数量一致增加。如果没有这些样本,来自采样范围之外的查询序列很容易被误解为来自其他物种。然而,我们表明,即使从不同亚群添加一个样本,也足以大大增加将未知查询正确分配到正确物种组的概率。这项研究强调了广泛采样的重要性,即使每个物种只有五个参考样本,也能在创建参考数据库方面发挥重要作用。