National Botanic Garden of Wales, Llanarthne, United Kingdom.
PLoS One. 2012;7(6):e37945. doi: 10.1371/journal.pone.0037945. Epub 2012 Jun 6.
We present the first national DNA barcode resource that covers the native flowering plants and conifers for the nation of Wales (1143 species). Using the plant DNA barcode markers rbcL and matK, we have assembled 97.7% coverage for rbcL, 90.2% for matK, and a dual-locus barcode for 89.7% of the native Welsh flora. We have sampled multiple individuals for each species, resulting in 3304 rbcL and 2419 matK sequences. The majority of our samples (85%) are from DNA extracted from herbarium specimens. Recoverability of DNA barcodes is lower using herbarium specimens, compared to freshly collected material, mostly due to lower amplification success, but this is balanced by the increased efficiency of sampling species that have already been collected, identified, and verified by taxonomic experts. The effectiveness of the DNA barcodes for identification (level of discrimination) is assessed using four approaches: the presence of a barcode gap (using pairwise and multiple alignments), formation of monophyletic groups using Neighbour-Joining trees, and sequence similarity in BLASTn searches. These approaches yield similar results, providing relative discrimination levels of 69.4 to 74.9% of all species and 98.6 to 99.8% of genera using both markers. Species discrimination can be further improved using spatially explicit sampling. Mean species discrimination using barcode gap analysis (with a multiple alignment) is 81.6% within 10×10 km squares and 93.3% for 2×2 km squares. Our database of DNA barcodes for Welsh native flowering plants and conifers represents the most complete coverage of any national flora, and offers a valuable platform for a wide range of applications that require accurate species identification.
我们呈现了首个涵盖威尔士本土开花植物和针叶树的全国性 DNA 条码资源(共 1143 种)。使用植物 DNA 条码标记 rbcL 和 matK,我们对 rbcL 的覆盖率达到了 97.7%,对 matK 的覆盖率达到了 90.2%,对本土威尔士植物群的双标记条码覆盖率达到了 89.7%。我们对每个物种都进行了多个个体的采样,共获得了 3304 个 rbcL 和 2419 个 matK 序列。我们的样本大部分(85%)来自于从植物标本中提取的 DNA。与新鲜采集的材料相比,使用植物标本提取 DNA 条形码的可恢复性较低,主要是因为扩增成功率较低,但这与增加已被分类专家采集、鉴定和验证过的物种采样效率相平衡。使用四种方法评估 DNA 条码的识别有效性(辨别水平):存在条码间隙(使用成对和多重比对)、使用邻接法树形成单系群,以及在 BLASTn 搜索中的序列相似性。这些方法得出了相似的结果,使用两种标记提供了所有物种相对辨别水平为 69.4%至 74.9%和属的相对辨别水平为 98.6%至 99.8%。使用空间显式采样可以进一步提高物种的辨别能力。使用条码间隙分析(进行多重比对)的平均物种辨别能力在 10×10km 正方形内为 81.6%,在 2×2km 正方形内为 93.3%。我们的威尔士本土开花植物和针叶树 DNA 条码数据库代表了任何国家植物群中最完整的覆盖范围,为广泛的应用提供了一个有价值的平台,这些应用需要准确的物种识别。