Department of Genetics and Biosystematics, Faculty of Biology, University of Gdansk, Gdansk, Poland.
Research Centre of Quarantine, Invasive and Genetically Modified Organisms, Institute of Plant Protection - National Research Institute, Poznań, Poland.
PLoS One. 2018 Jun 22;13(6):e0199609. doi: 10.1371/journal.pone.0199609. eCollection 2018.
The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.
细胞色素 c 氧化酶亚基 I(cox1)基因是主要的线粒体分子标记,在系统发育研究中起着关键作用,是关键的条形码序列。福默尔( Folmer )设计的“通用”引物可用于后生动物无脊椎动物中扩增此基因,从而实现快速、简便的条形码和系统发育分析。另一方面,随着条形码研究数量的增加,由于非目标分类群的扩增以及对获得的序列分析不足,导致越来越频繁地发布不正确的序列。因此,一些遗传数据库中储存的序列被错误地描述为从无脊椎动物中获得的,而实际上是细菌序列。在我们的研究中,我们使用福默尔( Folmer )的引物扩增了甲壳动物仙女虾( Branchipus schaefferi )( Fischer 1834 )的 COI 序列,也获得了气单胞菌属( Aeromonas sp. )的微生物污染物的 COI 序列。然而,当我们在 GenBank 数据库中搜索与这些污染物密切匹配的序列时,我们发现了描述为腹毛目和软体动物的条目。当将这些条目与数据库中具有相同名称的其他序列进行比较时,从同一物种扩增的错误和正确序列之间的遗传距离约为 65%。尽管正确鉴定物种的责任在于研究人员,但到目前为止,尚未重新评估已发表序列数据中发现的错误。根据标准采样技术,我们估计在 95%的概率下,在 GenBank 中发现描述不正确的后生动物序列的可能性取决于系统发育群,从不到 1%(软体动物和节肢动物)到 6.9%(腹毛目)不等。因此,DNA 条形码和代谢条形码分析的日益普及可能导致物种多样性的高估。最后,该研究还讨论了非目标序列扩增问题的来源。