Zhang A B, Sikes D S, Muster C, Li S Q
Institute of Zoology, Chinese Academy of Sciences, Beijing 100080, P. R. China.
Syst Biol. 2008 Apr;57(2):202-15. doi: 10.1080/10635150802032982.
DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence-based approach - inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification) - as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification.
DNA条形码作为一种物种鉴定方法正迅速受到欢迎。然而,对DNA条形码进行严格方法测试的研究仍相对较少。当前基于距离的方法经常受到批评,因为它通过原始相似度得分将最近邻视为最亲近的亲属,缺乏一套客观的分类单元划分标准,或者与基于经典特征的分类法不一致。在此,我们提出一种基于人工智能的方法——通过带有反向传播神经网络的DNA条形码推断物种归属(命名为基于BP的物种鉴定)——作为现有方法体系的一项新进展。我们通过在各种进化模型的溯祖模拟下代表不同序列变异水平的模拟数据集,以及来自东亚步甲(步甲科)和哥斯达黎加弄蝶的两个细胞色素氧化酶亚基I(COI)序列的实证数据集,展示了这种方法的价值。利用COI基因630至690个碱基对的片段,我们分别将80个未知步甲序列中的97.50%、新热带弄蝶的275个、205个和9个未知序列中的95.63%、96.10%和100%鉴定到正确的物种。我们的模拟研究表明,物种鉴定的成功率取决于序列的差异、序列长度和参考序列的数量。特别是在涉及不完全谱系分选的情况下,这种基于BP的新方法似乎优于常用的基于DNA的物种鉴定方法。