DNA条形码研究进展:物种覆盖率低且大量序列无法识别
An update on DNA barcoding: low species coverage and numerous unidentified sequences.
作者信息
Kwong Shiyang, Srivathsan Amrita, Meier Rudolf
机构信息
Department of Biological Sciences.
University Scholars Programme, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore.
出版信息
Cladistics. 2012 Dec;28(6):639-644. doi: 10.1111/j.1096-0031.2012.00408.x. Epub 2012 Jul 3.
DNA barcoding was proposed in 2003, the Consortium for the Barcode of Life was established in 2004, and the movement has since attracted more than $80 million funding. Here we investigate how many species of multicellular animals have been barcoded. We compare the numbers in a public database (GenBank as of January 2012) with those in the Barcode of Life Database (BOLD) and find that GenBank contains COI (cytochrome c oxidase subunit 1) sequences for ca. 60 000 species while BOLD reports barcodes for ca. 150 000 species. The discrepancy is likely due to a large amount of unpublished data in BOLD. Overall, the species coverage remains sparse, growth rates are low, and the barcode accumulation curve for Metazoa is linear with only 4788 species having been added in 2011. In addition, the vast majority of species in the public database (73%) were barcoded by projects that are unlikely to be related to the DNA barcoding movement. Particularly surprising was the large number of DNA barcodes in GenBank that were not identified to species (Jan 2012: 74%), with insect barcodes often being identified only to order. Of these several hundred thousand have since been suppressed by NCBI because they did not satisfy the iBOL/GenBank early release agreement. Species coverage is considerably better for target taxa of DNA barcoding campaigns (e.g. birds, fishes, Lepidoptera), although it also falls short of published campaign targets. © The Willi Hennig Society 2012.
DNA条形码技术于2003年被提出,生命条形码联盟于2004年成立,自那时起该活动已吸引了超过8000万美元的资金。在此,我们调查了有多少种多细胞动物已被进行条形码鉴定。我们将一个公共数据库(截至2012年1月的GenBank)中的数据与生命条形码数据库(BOLD)中的数据进行比较,发现GenBank包含约60000个物种的细胞色素c氧化酶亚基1(COI)序列,而BOLD报告的条形码物种约为150000种。这种差异可能是由于BOLD中有大量未发表的数据。总体而言,物种覆盖仍然稀疏,增长率较低,后生动物的条形码积累曲线呈线性,2011年仅增加了4788个物种。此外,公共数据库中绝大多数物种(73%)的条形码鉴定是由与DNA条形码运动不太可能相关的项目完成的。特别令人惊讶的是,GenBank中有大量未鉴定到物种的DNA条形码(2012年1月:74%),昆虫条形码通常仅鉴定到目。此后,其中数十万条已被美国国立医学图书馆(NCBI)抑制,因为它们不符合国际生命条形码联盟(iBOL)/GenBank的早期发布协议。对于DNA条形码运动的目标分类群(如鸟类、鱼类、鳞翅目),物种覆盖情况要好得多,尽管也未达到已公布的活动目标。©威利·亨尼希学会2012年。