用于核糖体RNA序列系统发育分类的反向传播和反向传播神经网络。

Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences.

作者信息

Wu C, Shivakumar S

机构信息

Department of Epidemiology/Biomathematics, University of Texas Health Center at Tyler 75710.

出版信息

Nucleic Acids Res. 1994 Oct 11;22(20):4291-9. doi: 10.1093/nar/22.20.4291.

DOI:10.1093/nar/22.20.4291

PMID:7937158

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC331947/

Abstract

A neural network system has been developed for rapid and accurate classification of ribosomal RNA sequences according to phylogenetic relationship. The molecular sequences are encoded into neural input vectors using an n-gram hashing method. A SVD (singular value decomposition) method is used to compress and reduce the size of long and sparse n-gram input vectors. The neural networks used are three-layered, feed-forward networks that employ supervised learning paradigms, including the back-propagation algorithm and a modified counter-propagation algorithm. A pedagogical pattern selection strategy is used to reduce the training time. After trained with ribosomal RNA sequences of the RDP (Ribosomal Database Project) database, the system can classify query sequences into more than one hundred phylogenetic classes with a 100% accuracy at a rate of less than 0.3 CPU second per sequence on a workstation. When compared to other sequence similarity search methods, including Similarity Rank, Blast and Fasta, the neural network method has a higher classification accuracy at a speed of about an order of magnitude faster. The software tool will be made available to the biology community, and the system may be extended into a gene identification system for classifying indiscriminately sequenced DNA fragments.

摘要

已开发出一种神经网络系统，用于根据系统发育关系对核糖体RNA序列进行快速准确的分类。使用n元语法哈希方法将分子序列编码为神经输入向量。奇异值分解（SVD）方法用于压缩和减小长而稀疏的n元语法输入向量的大小。所使用的神经网络是三层前馈网络，采用监督学习范式，包括反向传播算法和改进的对向传播算法。采用一种教学模式选择策略来减少训练时间。在用核糖体数据库项目（RDP）数据库的核糖体RNA序列进行训练后，该系统能够在工作站上以每秒每个序列小于0.3个CPU秒的速度将查询序列分类到一百多个系统发育类别中，准确率达到100%。与其他序列相似性搜索方法（包括相似性排名、Blast和Fasta）相比，神经网络方法在速度快约一个数量级的情况下具有更高的分类准确率。该软件工具将提供给生物学界，并且该系统可能会扩展为一个基因识别系统，用于对未经区分测序的DNA片段进行分类。

相似文献

Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences.用于核糖体RNA序列系统发育分类的反向传播和反向传播神经网络。

Nucleic Acids Res. 1994 Oct 11;22(20):4291-9. doi: 10.1093/nar/22.20.4291.

Neural networks for molecular sequence classification.用于分子序列分类的神经网络。

Proc Int Conf Intell Syst Mol Biol. 1993;1:429-37.

Protein classification artificial neural system.蛋白质分类人工神经系统。

Protein Sci. 1992 May;1(5):667-77. doi: 10.1002/pro.5560010512.

The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy.核糖体数据库项目（RDP-II）：预览一种允许定期更新的新型自动比对工具和新的原核生物分类法。

Nucleic Acids Res. 2003 Jan 1;31(1):442-3. doi: 10.1093/nar/gkg039.

The Ribosomal Database Project.核糖体数据库项目

Nucleic Acids Res. 1994 Sep;22(17):3485-7. doi: 10.1093/nar/22.17.3485.

Factors that affect large subunit ribosomal DNA amplicon sequencing studies of fungal communities: classification method, primer choice, and error.影响真菌群落大亚基核糖体 DNA 扩增子测序研究的因素：分类方法、引物选择和误差。

PLoS One. 2012;7(4):e35749. doi: 10.1371/journal.pone.0035749. Epub 2012 Apr 27.

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.一种用于16S rRNA基因序列的贝叶斯分类方法，具有更高的物种水平准确性。

BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.

Impact of descriptor vector scaling on the classification of drugs and nondrugs with artificial neural networks.描述符向量缩放对利用人工神经网络进行药物与非药物分类的影响。

J Mol Model. 2004 Jun;10(3):204-11. doi: 10.1007/s00894-004-0186-9. Epub 2004 Apr 6.

RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits.RibAlign：一种基于串联核糖体蛋白亚基的真细菌系统发育分析的软件工具和数据库。

BMC Bioinformatics. 2006 Feb 13;7:66. doi: 10.1186/1471-2105-7-66.

The Ribosomal Database Project (RDP).核糖体数据库项目（RDP）。

Nucleic Acids Res. 1996 Jan 1;24(1):82-5. doi: 10.1093/nar/24.1.82.

引用本文的文献

Comparison of Diagnosis Accuracy between a Backpropagation Artificial Neural Network Model and Linear Regression in Digestive Disease Patients: an Empirical Research.反向传播人工神经网络模型与线性回归在消化系统疾病患者中的诊断准确性比较：一项实证研究。

Comput Math Methods Med. 2021 Feb 27;2021:6662779. doi: 10.1155/2021/6662779. eCollection 2021.

A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.一种通过机器学习与生物信息学方法相结合，利用蛋白质编码和非编码 DNA 条码进行物种鉴定的新方法。

PLoS One. 2012;7(2):e30986. doi: 10.1371/journal.pone.0030986. Epub 2012 Feb 20.

Computational analysis and modeling of cleavage by the immunoproteasome and the constitutive proteasome.免疫蛋白酶体和组成型蛋白酶体切割的计算分析和建模。

BMC Bioinformatics. 2010 Sep 23;11:479. doi: 10.1186/1471-2105-11-479.

Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles.利用图谱预测肽与MHC分子结合的RANKPEP资源的增强。

Immunogenetics. 2004 Sep;56(6):405-19. doi: 10.1007/s00251-004-0709-7. Epub 2004 Sep 3.

Cold-adapted alanine dehydrogenases from two antarctic bacterial strains: gene cloning, protein characterization, and comparison with mesophilic and thermophilic counterparts.来自两株南极细菌菌株的冷适应型丙氨酸脱氢酶：基因克隆、蛋白质特性分析以及与嗜温菌和嗜热菌对应物的比较。

Appl Environ Microbiol. 1999 Sep;65(9):4014-20. doi: 10.1128/AEM.65.9.4014-4020.1999.

Cold-active serine alkaline protease from the psychrotrophic bacterium Shewanella strain ac10: gene cloning and enzyme purification and characterization.嗜冷细菌希瓦氏菌属ac10菌株的冷活性丝氨酸碱性蛋白酶：基因克隆、酶的纯化及特性分析

Appl Environ Microbiol. 1999 Feb;65(2):611-7. doi: 10.1128/AEM.65.2.611-617.1999.

Phylogenetic analysis of the bacterial communities in marine sediments.海洋沉积物中细菌群落的系统发育分析。

Appl Environ Microbiol. 1996 Nov;62(11):4049-59. doi: 10.1128/aem.62.11.4049-4059.1996.

本文引用的文献

The PIR-International databases.PIR国际数据库。

Nucleic Acids Res. 1993 Jul 1;21(13):3089-92. doi: 10.1093/nar/21.13.3089.

The ribosomal database project.核糖体数据库项目

Nucleic Acids Res. 1993 Jul 1;21(13):3021-3. doi: 10.1093/nar/21.13.3021.

Self-organized neural maps of human protein sequences.人类蛋白质序列的自组织神经图谱。

Protein Sci. 1994 Mar;3(3):507-21. doi: 10.1002/pro.5560030316.

Predicting the secondary structure of globular proteins using neural network models.使用神经网络模型预测球状蛋白质的二级结构。

J Mol Biol. 1988 Aug 20;202(4):865-84. doi: 10.1016/0022-2836(88)90564-5.

Improved tools for biological sequence comparison.用于生物序列比较的改进工具。

Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444.

Phylogenies from molecular sequences: inference and reliability.基于分子序列的系统发育：推断与可靠性

Annu Rev Genet. 1988;22:521-65. doi: 10.1146/annurev.ge.22.120188.002513.

Bacterial evolution.细菌进化

Microbiol Rev. 1987 Jun;51(2):221-71. doi: 10.1128/mr.51.2.221-271.1987.

Improvements in protein secondary structure prediction by an enhanced neural network.通过增强神经网络改进蛋白质二级结构预测

J Mol Biol. 1990 Jul 5;214(1):171-82. doi: 10.1016/0022-2836(90)90154-E.

Basic local alignment search tool.基本局部比对搜索工具

J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.

A new family of powerful multivariate statistical sequence analysis techniques.一个强大的多元统计序列分析技术新家族。

J Mol Biol. 1991 Aug 20;220(4):877-87. doi: 10.1016/0022-2836(91)90360-i.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。