从蛋白质序列中提取特征以改进用于蛋白质折叠识别的深度极限学习机。

Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

作者信息

Ibrahim Wisam, Abadeh Mohammad Saniee

机构信息

Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran.

出版信息

J Theor Biol. 2017 May 21;421:1-15. doi: 10.1016/j.jtbi.2017.03.023. Epub 2017 Mar 27.

DOI:10.1016/j.jtbi.2017.03.023

PMID:28351701

Abstract

Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage.

摘要

蛋白质折叠识别是生物信息学中预测蛋白质三维结构的一个重要问题。蛋白质折叠识别问题中最具挑战性的任务之一是从氨基酸序列中提取有效特征以获得更好的分类器。在本文中，我们提出了六种描述符来从蛋白质序列中提取特征。这些描述符应用于三阶段框架PCA - DELM - LDA的第一阶段，以从氨基酸序列中提取特征向量。主成分分析（PCA）已被用于减少提取特征的数量。提取的特征向量与原始特征一起用于在第二阶段提高深度极限学习机（DELM）的性能。在第二阶段提取了四个新特征，并在第三阶段通过线性判别分析（LDA）将实例分类为27个折叠。所提出的框架在SCOP数据集中的独立和组合特征集上实现。实验结果表明，第一阶段提取的特征向量可以提高DELM在第二阶段提取新的有用特征的性能。

相似文献

Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.从蛋白质序列中提取特征以改进用于蛋白质折叠识别的深度极限学习机。

J Theor Biol. 2017 May 21;421:1-15. doi: 10.1016/j.jtbi.2017.03.023. Epub 2017 Mar 27.

Type-2 Fuzzy PCA Approach in Extracting Salient Features for Molecular Cancer Diagnostics and Prognostics.基于 2 型模糊 PCA 的方法用于提取分子癌症诊断和预后的显著特征。

IEEE Trans Nanobioscience. 2019 Jul;18(3):482-489. doi: 10.1109/TNB.2019.2917814. Epub 2019 May 20.

Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping.利用核动态时间规整通过氨基酸残基比对进行蛋白质折叠识别。

J Theor Biol. 2014 Aug 7;354:137-45. doi: 10.1016/j.jtbi.2014.03.033. Epub 2014 Mar 31.

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习：用于蛋白质折叠识别和远程同源性检测

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.蛋白质折叠预测：新特征提取、降维及异构分类器融合

IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.

Improving Protein Fold Recognition by Deep Learning Networks.通过深度学习网络改进蛋白质折叠识别

Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition.基于机器学习的蛋白质折叠识别方法的最新进展

Int J Mol Sci. 2016 Dec 16;17(12):2118. doi: 10.3390/ijms17122118.

Descriptor-based protein remote homology identification.基于描述符的蛋白质远程同源性鉴定。

Protein Sci. 2005 Feb;14(2):431-44. doi: 10.1110/ps.041035505. Epub 2005 Jan 4.

Using Weighted Extreme Learning Machine Combined With Scale-Invariant Feature Transform to Predict Protein-Protein Interactions From Protein Evolutionary Information.基于加权极限学习机和尺度不变特征变换预测蛋白质进化信息的蛋白质-蛋白质相互作用。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1546-1554. doi: 10.1109/TCBB.2020.2965919. Epub 2020 Jan 10.

引用本文的文献

Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection.天然和设计的类似蛋白质序列的特征有效地填补了蛋白质序列缺口：在远距离同源性检测中的意义。

Methods Mol Biol. 2022;2449:149-167. doi: 10.1007/978-1-0716-2095-3_5.

PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach.PrESOgenesis：一种使用支持向量机和伪氨基酸组成方法识别与生育能力相关的蛋白质的双层多标签预测器。

Sci Rep. 2018 Jun 13;8(1):9025. doi: 10.1038/s41598-018-27338-9.

Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection.使用带有特征选择的分层极限学习机（H-ELM）算法从其他长链非编码RNA中区分环状RNA。

Mol Genet Genomics. 2018 Feb;293(1):137-149. doi: 10.1007/s00438-017-1372-7. Epub 2017 Sep 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从蛋白质序列中提取特征以改进用于蛋白质折叠识别的深度极限学习机。

Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献