基于改进的极限学习机算法的蛋白质序列分类

Protein sequence classification with improved extreme learning machine algorithms.

作者信息

Cao Jiuwen, Xiong Lianglin

机构信息

Institute of Information and Control, Hangzhou Dianzi University, Zhejiang 310018, China.

School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming 650500, China ; School of Mathematics and Statistics, Yunnan University, Kunming 650091, China.

出版信息

Biomed Res Int. 2014;2014:103054. doi: 10.1155/2014/103054. Epub 2014 Mar 30.

DOI:10.1155/2014/103054

PMID:24795876

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3985160/

Abstract

Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms.

摘要

从大型生物蛋白质序列数据库中精确分类蛋白质序列对于开发具有竞争力的药理产品具有重要作用。将未知序列与所有已识别的蛋白质序列进行比较，并返回相似度得分最高的蛋白质的类别索引，传统方法通常耗时较长。因此，构建一个高效的蛋白质序列分类系统迫在眉睫且十分必要。在本文中，我们研究了使用单隐层前馈神经网络（SLFNs）进行蛋白质序列分类的性能。近期高效的极限学习机（ELM）及其变体被用作训练算法。本文首次将最优剪枝极限学习机应用于蛋白质序列分类。为了进一步提高性能，构建了基于集成的SLFNs结构，其中使用多个具有相同数量隐藏节点和相同激活函数的SLFNs作为集成。对于每个集成，采用相同的训练算法。最终的类别索引通过多数投票法得出。基于集成的SLFNs采用了两种方法，即基本极限学习机（basic ELM）和最优剪枝极限学习机（OP-ELM）。使用从蛋白质信息资源中心获得的数据集，对性能进行了分析，并与几种现有方法进行了比较。实验结果表明了所提算法的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9048/3985160/80fcfbd4af28/BMRI2014-103054.001.jpg

相似文献

Protein sequence classification with improved extreme learning machine algorithms.基于改进的极限学习机算法的蛋白质序列分类

Biomed Res Int. 2014;2014:103054. doi: 10.1155/2014/103054. Epub 2014 Mar 30.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

An improvement of extreme learning machine for compact single-hidden-layer feedforward neural networks.用于紧凑型单隐层前馈神经网络的极限学习机改进方法。

Int J Neural Syst. 2008 Oct;18(5):433-41. doi: 10.1142/S0129065708001695.

Automated protein classification using consensus decision.使用共识决策的自动化蛋白质分类

Proc IEEE Comput Syst Bioinform Conf. 2004:224-35. doi: 10.1109/csb.2004.1332436.

Error minimized extreme learning machine with growth of hidden nodes and incremental learning.具有隐藏节点增长和增量学习的误差最小化极限学习机

IEEE Trans Neural Netw. 2009 Aug;20(8):1352-7. doi: 10.1109/TNN.2009.2024147. Epub 2009 Jul 10.

Classification of protein quaternary structure with support vector machine.用支持向量机对蛋白质四级结构进行分类。

Bioinformatics. 2003 Dec 12;19(18):2390-6. doi: 10.1093/bioinformatics/btg331.

SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.SCPRED：对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。

BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.

Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines.通过基于高级集成的异构极端学习机提高分类性能。

Comput Intell Neurosci. 2017;2017:3405463. doi: 10.1155/2017/3405463. Epub 2017 May 4.

Multi-category classification using an Extreme Learning Machine for microarray gene expression cancer diagnosis.使用极限学习机进行多类别分类以诊断微阵列基因表达癌症

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):485-495. doi: 10.1109/tcbb.2007.1012.

Decision tree based information integration for automated protein classification.基于决策树的信息整合用于蛋白质自动分类

J Bioinform Comput Biol. 2005 Jun;3(3):717-42. doi: 10.1142/s0219720005001259.

引用本文的文献

ProtInteract: A deep learning framework for predicting protein-protein interactions.ProtInteract：一种用于预测蛋白质-蛋白质相互作用的深度学习框架。

Comput Struct Biotechnol J. 2023 Jan 25;21:1324-1348. doi: 10.1016/j.csbj.2023.01.028. eCollection 2023.

Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties.利用DNA序列信息和理化性质对甲型禽流感病毒进行分类的计算方法

Front Genet. 2021 Jan 28;12:599321. doi: 10.3389/fgene.2021.599321. eCollection 2021.

DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning.DNC4mC-Deep：基于深度学习的不同编码方案识别和分析 DNA N4-甲基胞嘧啶位点。

Cells. 2020 Jul 22;9(8):1756. doi: 10.3390/cells9081756.

CirRNAPL: A web server for the identification of circRNA based on extreme learning machine.CirRNAPL：一个基于极限学习机的环状RNA识别网络服务器。

Comput Struct Biotechnol J. 2020 Apr 2;18:834-842. doi: 10.1016/j.csbj.2020.03.028. eCollection 2020.

iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks.iPseU-CNN：使用卷积神经网络识别RNA假尿苷位点。

Mol Ther Nucleic Acids. 2019 Jun 7;16:463-470. doi: 10.1016/j.omtn.2019.03.010. Epub 2019 Apr 11.

Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.基于聚类的加权相似极限学习机的虚拟筛选新方法。

PLoS One. 2018 Apr 13;13(4):e0195478. doi: 10.1371/journal.pone.0195478. eCollection 2018.

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.一种基于 DNA 序列信息和理化性质的新型 DNA 甲基化位点检测计算方法。

Int J Mol Sci. 2018 Feb 8;19(2):511. doi: 10.3390/ijms19020511.

A Novel Modeling in Mathematical Biology for Classification of Signal Peptides.一种用于信号肽分类的数学生物学新模型。

Sci Rep. 2018 Jan 18;8(1):1039. doi: 10.1038/s41598-018-19491-y.

Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection.使用带有特征选择的分层极限学习机（H-ELM）算法从其他长链非编码RNA中区分环状RNA。

Mol Genet Genomics. 2018 Feb;293(1):137-149. doi: 10.1007/s00438-017-1372-7. Epub 2017 Sep 14.

Predicting protein-protein interactions via multivariate mutual information of protein sequences.通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2016 Sep 27;17(1):398. doi: 10.1186/s12859-016-1253-9.

本文引用的文献

Analysis of structures, functions, and epitopes of cysteine protease from Spirometra erinaceieuropaei Spargana.曼氏迭宫绦虫裂头蚴半胱氨酸蛋白酶的结构、功能及表位分析

Biomed Res Int. 2013;2013:198250. doi: 10.1155/2013/198250. Epub 2013 Dec 12.

Structural and sequence similarities of hydra xeroderma pigmentosum A protein to human homolog suggest early evolution and conservation.水螅型薛定谔早老症 A 蛋白的结构和序列与人类同源物的相似性表明了其早期进化和保守性。

Biomed Res Int. 2013;2013:854745. doi: 10.1155/2013/854745. Epub 2013 Sep 5.

Analysis of structures and epitopes of surface antigen glycoproteins expressed in bradyzoites of Toxoplasma gondii.分析刚地弓形虫缓殖子表面抗原糖蛋白的结构和表位。

Biomed Res Int. 2013;2013:165342. doi: 10.1155/2013/165342. Epub 2013 Mar 21.

Biochemical, pharmacological, and structural characterization of new basic PLA2 Bbil-TX from Bothriopsis bilineata snake venom.从两头蛇蛇毒中新型碱性 PLA2 Bbil-TX 的生化、药理学和结构特征。

Biomed Res Int. 2013;2013:612649. doi: 10.1155/2013/612649. Epub 2012 Dec 30.

Protein sequence classification using feature hashing.使用特征哈希进行蛋白质序列分类。

Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S14. doi: 10.1186/1477-5956-10-S1-S14.

Extreme learning machine for regression and multiclass classification.用于回归和多类分类的极限学习机。

IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):513-29. doi: 10.1109/TSMCB.2011.2168604. Epub 2011 Oct 6.

Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput "omics" Data.用于整合和分析多个高通量“组学”数据的蛋白质生物信息学基础设施

Adv Bioinformatics. 2010;2010:423589. doi: 10.1155/2010/423589. Epub 2010 Mar 29.

OP-ELM: optimally pruned extreme learning machine.OP-ELM：最优剪枝极限学习机

IEEE Trans Neural Netw. 2010 Jan;21(1):158-62. doi: 10.1109/TNN.2009.2036259. Epub 2009 Dec 8.

A comparison of methods for multiclass support vector machines.多类支持向量机方法的比较

IEEE Trans Neural Netw. 2002;13(2):415-25. doi: 10.1109/72.991427.

Multi-category classification using an Extreme Learning Machine for microarray gene expression cancer diagnosis.使用极限学习机进行多类别分类以诊断微阵列基因表达癌症

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):485-495. doi: 10.1109/tcbb.2007.1012.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于改进的极限学习机算法的蛋白质序列分类

Protein sequence classification with improved extreme learning machine algorithms.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献