Museo Nacional de Ciencias Naturales, C/José Gutiérrez Abascal, 2, 28006, Madrid, Spain.
UMIB Research Unit of Biodiversity (UO, CSIC, PA), Oviedo University - Campus Mieres, C/Gonzalo Gutiérrez Quirós s/n, 33600, Mieres, Spain.
Exp Appl Acarol. 2022 Mar;86(3):371-384. doi: 10.1007/s10493-022-00703-0. Epub 2022 Feb 25.
Public molecular databases are fundamental tools for modern taxonomic studies whose usefulness rely on the soundness of the data within them. Here, we study potential errors that can arise along the data pipeline from sampling, specimen identification and molecular processing (digestion, amplification and sequencing) to the submission of sequences to these databases by using the DNA sequences of Hydrachnidia (Acari, Parasitengona) as a case study. Our results indicate that molecular information is available for only about 3% of the Hydrachnidia species known to date; yet, within this small percentage, errors are present in almost 5% of the species analyzed (0.5% of the sequences and almost 11% of the genera). This study underscores the scarcity of genetic data available for Hydrachnidia, but also that the proportion of errors in DNA sequences is relatively small. Even so, it highlights the danger associated with using DNA sequences from public databases, particularly for species identification, and reinforces the need for greater quality control measures and/or protocols to avoid an intensification of errors in the (post) genomics era. Finally, our study emphasizes that potential errors may also reveal cryptic diversity within a species.
公共分子数据库是现代分类学研究的基础工具,其用途取决于其中数据的可靠性。在这里,我们研究了从采样、标本鉴定和分子处理(消化、扩增和测序)到将序列提交到这些数据库的过程中可能出现的潜在错误,以水螨(蜱螨目,寄生螨)的 DNA 序列为例。我们的结果表明,目前已知的水螨物种中,只有约 3%的物种有分子信息;然而,在这个小比例中,分析的物种中有近 5%(序列的 0.5%和属的近 11%)存在错误。这项研究强调了水螨可用遗传数据的稀缺性,但也表明 DNA 序列中的错误比例相对较小。即便如此,它突出了使用公共数据库中的 DNA 序列的危险,特别是对于物种鉴定,并且强调需要采取更大的质量控制措施和/或协议,以避免在后基因组时代错误的加剧。最后,我们的研究强调,潜在的错误也可能揭示一个物种内的隐藏多样性。