Suppr超能文献

一种利用核苷酸相关性分析真菌物种的新方法。

A new efficient method for analyzing fungi species using correlations between nucleotides.

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People's Republic of China.

出版信息

BMC Evol Biol. 2018 Dec 27;18(1):200. doi: 10.1186/s12862-018-1330-y.

Abstract

BACKGROUND

In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of increase of available data.

RESULTS

In this study, we propose a new method for DNA barcode classification based on the distribution of nucleotides within the sequence. By adding the covariance of nucleotides to the original natural vector, this augmented 18-dimensional natural vector makes good use of the available information in the DNA sequence. The accurate classification results we obtained demonstrate that this new 18-dimensional natural vector method, together with the random forest classifier algorthm, can serve as a computationally efficient identification tool for DNA barcodes. We performed phylogenetic analysis on the genus Megacollybia to validate our method. We also studied how effective our method was in determining the genetic distance within and between species in our barcoding dataset.

CONCLUSIONS

The classification performs well on the fungi barcode dataset with high and robust accuracy. The reasonable phylogenetic trees we obtained further validate our methods. This method is alignment-free and does not depend on any model assumption, and it will become a powerful tool for classification and evolutionary analysis.

摘要

背景

近年来,DNA 条形码已成为生物学家识别物种和了解其自然生物多样性的重要工具。条码数据的复杂性使得快速有效地分析变得困难。对这些数据的手动分类无法跟上可用数据的增长速度。

结果

在这项研究中,我们提出了一种基于序列内核苷酸分布的 DNA 条形码分类新方法。通过在原始自然向量中添加核苷酸的协方差,这个扩充的 18 维自然向量充分利用了 DNA 序列中的可用信息。我们获得的准确分类结果表明,这种新的 18 维自然向量方法与随机森林分类器算法一起,可以作为一种计算效率高的 DNA 条形码识别工具。我们对 Megacollybia 属进行了系统发育分析以验证我们的方法。我们还研究了我们的条形码数据集内和种间的遗传距离,我们的方法在其中的有效性。

结论

该分类方法在真菌条码数据集上具有较高且稳健的准确性。我们得到的合理系统发育树进一步验证了我们的方法。该方法是无比对的,不依赖于任何模型假设,它将成为分类和进化分析的有力工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3710/6307163/3af6b9e29998/12862_2018_1330_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验