Suppr
超能文献

一种基于光谱表示和神经气体网络的基于k-mer的条形码DNA分类方法。

A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

作者信息

Fiannaca Antonino, La Rosa Massimo, Rizzo Riccardo, Urso Alfonso

机构信息

Institute of High-Performance Computing and Networking, National Research Council of Italy, Viale delle Scienze, Ed. 11, 90128 Palermo, Italy.

出版信息

Artif Intell Med. 2015 Jul;64(3):173-84. doi: 10.1016/j.artmed.2015.06.002. Epub 2015 Jul 4.

DOI:10.1016/j.artmed.2015.06.002

PMID:26170017

Abstract

OBJECTIVES

In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed.

METHODS

In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database".

RESULTS

The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%.

CONCLUSIONS

Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.

摘要

目标

本文提出一种基于频谱表示和用于无监督聚类的神经气体网络的无比对DNA条形码分类方法。

方法

在所提出的方法中，从DNA序列的频谱表示中识别出独特的单词。然后使用序列签名（即能够将DNA序列分配到其正确分类类别的最小k-mer集合）对DNA序列进行分类学分类。然后进行实验，将我们的方法与其他监督机器学习分类算法进行比较，如支持向量机、随机森林、Ripper、朴素贝叶斯、Ridor和分类树，这些算法也考虑200和300个碱基对（bp）的短DNA序列片段。实验测试是在属于不同动物物种的10个真实条形码数据集上进行的，这些数据集由在线资源“生命条形码数据库”提供。

结果

实验结果表明，在考虑全长序列时，我们基于k-mer的方法在准确性、召回率和精确率指标方面与其他分类器直接可比。此外，当使用从原始数据中随机提取的一组短DNA序列执行分类任务时，我们证明了我们方法的稳健性。例如，所提出的方法使用200-bp片段在物种水平上可以达到64.8%的准确率。在相同条件下，最佳的其他分类器（随机森林）达到20.9%的准确率。

结论

我们的结果表明，在研究短DNA条形码序列片段方面，我们相对于其他分类器有明显的改进。

相似文献

A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

Artif Intell Med. 2015 Jul;64(3):173-84. doi: 10.1016/j.artmed.2015.06.002. Epub 2015 Jul 4.

Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

Gene. 2016 Nov 5;592(2):316-24. doi: 10.1016/j.gene.2016.07.010. Epub 2016 Jul 5.

Supervised DNA Barcodes species classification: analysis, comparisons and results.

BioData Min. 2014 Apr 11;7(1):4. doi: 10.1186/1756-0381-7-4.

Scalable classification of organisms into a taxonomy using hierarchical supervised learners.

J Bioinform Comput Biol. 2020 Oct;18(5):2050026. doi: 10.1142/S0219720020500262. Epub 2020 Oct 29.

Deep learning models for bacteria taxonomic classification of metagenomic data.

BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.

BLOG 2.0: a software system for character-based species classification with DNA Barcode sequences. What it does, how to use it.

Mol Ecol Resour. 2013 Nov;13(6):1043-6. doi: 10.1111/1755-0998.12073. Epub 2013 Jan 28.

Probabilistic topic modeling for the analysis and classification of genomic sequences.

BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S2. doi: 10.1186/1471-2105-16-S6-S2. Epub 2015 Apr 17.

Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

PLoS One. 2013 Oct 18;8(10):e76910. doi: 10.1371/journal.pone.0076910. eCollection 2013.

ABGD, Automatic Barcode Gap Discovery for primary species delimitation.

Mol Ecol. 2012 Apr;21(8):1864-77. doi: 10.1111/j.1365-294X.2011.05239.x. Epub 2011 Aug 29.

Classification of cancer cells using computational analysis of dynamic morphology.

Comput Methods Programs Biomed. 2018 Mar;156:105-112. doi: 10.1016/j.cmpb.2017.12.003. Epub 2017 Dec 7.

引用本文的文献

DCiPatho: deep cross-fusion networks for genome scale identification of pathogens.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad194.

A Graph Neural Network Approach for the Analysis of siRNA-Target Biological Networks.

Int J Mol Sci. 2022 Nov 17;23(22):14211. doi: 10.3390/ijms232214211.

Mathematical Modeling and Computational Prediction of High-Risk Types of Human Papillomaviruses.

Comput Math Methods Med. 2022 Jul 21;2022:1515810. doi: 10.1155/2022/1515810. eCollection 2022.

Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification.

Plants (Basel). 2021 Dec 13;10(12):2741. doi: 10.3390/plants10122741.

Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors.

Comput Struct Biotechnol J. 2020 Nov 12;18:3528-3538. doi: 10.1016/j.csbj.2020.10.032. eCollection 2020.

Methylation-driven model for analysis of dinucleotide evolution in genomes.

Theor Biol Med Model. 2020 Apr 8;17(1):3. doi: 10.1186/s12976-020-00122-x.

Deep learning architectures for prediction of nucleosome positioning from sequences data.

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):418. doi: 10.1186/s12859-018-2386-9.

Deep learning models for bacteria taxonomic classification of metagenomic data.

BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

一种基于光谱表示和神经气体网络的基于k-mer的条形码DNA分类方法。

A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目标

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译