Suppr超能文献

使用DNA条形码识别标本的算法比较:裸子植物的实例

A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms.

作者信息

Little Damon P, Stevenson Dennis Wm

机构信息

Lewis B. and Dorothy Cullman Program for Molecular Systematic Studies, The New York Botanical Garden, Bronx, New York 10458-5126, USA.

出版信息

Cladistics. 2007 Feb;23(1):1-21. doi: 10.1111/j.1096-0031.2006.00126.x.

Abstract

In order to use DNA sequences for specimen identification (e.g., barcoding, fingerprinting) an algorithm to compare query sequences with a reference database is needed. Precision and accuracy of query sequence identification was estimated for hierarchical clustering (parsimony and neighbor joining), similarity methods (BLAST, BLAT and megaBLAST), combined clustering/similarity methods (BLAST/parsimony and BLAST/neighbor joining), diagnostic methods (DNA-BAR and DOME ID), and a new method (ATIM). We offer two novel alignment-free algorithmic solutions (DOME ID and ATIM) to identify query sequences for the purposes of DNA barcoding. Publicly available gymnosperm nrITS 2 and plastid matK sequences were used as test data sets. On the test data sets, almost all of the methods were able to accurately identify sequences to genus; however, no method was able to accurately identify query sequences to species at a frequency that would be considered useful for routine specimen identification (42-71% unambiguously correct). Clustering methods performed the worst (perhaps due to alignment issues). Similarity methods, ATIM, DNA-BAR, and DOME ID all performed at approximately the same level. Given the relative precision of the algorithms (median = 67% unambiguous), the low accuracy of species-level identification observed could be ascribed to the lack of correspondence between patterns of allelic similarity and species delimitations. Application of DNA barcoding to sequences of CITES listed cycads (Cycadopsida) provides an example of the potential application of DNA barcoding to enforcement of conservation laws.

摘要

为了使用DNA序列进行样本鉴定(如条形码技术、指纹识别),需要一种将查询序列与参考数据库进行比较的算法。我们评估了层次聚类法(简约法和邻接法)、相似性方法(BLAST、BLAT和megaBLAST)、聚类/相似性组合方法(BLAST/简约法和BLAST/邻接法)、诊断方法(DNA-BAR和DOME ID)以及一种新方法(ATIM)在查询序列鉴定方面的精度和准确性。我们提供了两种新颖的无比对算法解决方案(DOME ID和ATIM),用于DNA条形码技术目的的查询序列鉴定。公开可用的裸子植物nrITS 2和质体matK序列被用作测试数据集。在测试数据集上,几乎所有方法都能准确地将序列鉴定到属;然而,如果要用于常规样本鉴定(明确正确的频率为42%-71%),没有一种方法能够准确地将查询序列鉴定到种。聚类方法表现最差(可能是由于比对问题)。相似性方法、ATIM、DNA-BAR和DOME ID的表现大致相同。鉴于算法的相对精度(中位数=67%明确),观察到的物种水平鉴定的低准确性可能归因于等位基因相似性模式与物种界定之间缺乏对应关系。将DNA条形码技术应用于《濒危野生动植物种国际贸易公约》(CITES)所列苏铁科(苏铁纲)植物的序列,为DNA条形码技术在保护法执行中的潜在应用提供了一个例子。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验