一种基于进化特征的新型融合方法，用于使用支持向量机进行蛋白质折叠识别。

A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.

机构信息

Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran.

Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran.

出版信息

Sci Rep. 2020 Sep 1;10(1):14368. doi: 10.1038/s41598-020-71172-x.

DOI:10.1038/s41598-020-71172-x

PMID:32873824

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7463267/

Abstract

Protein fold recognition plays a crucial role in discovering three-dimensional structure of proteins and protein functions. Several approaches have been employed for the prediction of protein folds. Some of these approaches are based on extracting features from protein sequences and using a strong classifier. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In recent years, finding an efficient technique for integrating discriminate features have been received advancing attention. In this study, we integrate Auto-Cross-Covariance and Separated dimer evolutionary feature extraction methods. The results' features are scored by Information gain to define and select several discriminated features. According to three benchmark datasets, DD, RDD ,and EDD, the results of the support vector machine show more than 6[Formula: see text] improvement in accuracy on these benchmark datasets.

摘要

蛋白质折叠识别在发现蛋白质的三维结构和蛋白质功能方面起着至关重要的作用。已经采用了几种方法来预测蛋白质折叠。其中一些方法基于从蛋白质序列中提取特征，并使用强分类器。特征提取技术通常利用基于语法的信息、基于进化的信息和基于物理化学的信息来提取特征。近年来，寻找一种有效的技术来集成判别特征已引起人们的关注。在这项研究中，我们整合了自交叉协方差和分离二聚体进化特征提取方法。使用信息增益对结果的特征进行评分，以定义和选择几个有区别的特征。根据三个基准数据集 DD、RDD 和 EDD，支持向量机的结果在这些基准数据集上的准确率提高了 6[Formula: see text]以上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b93e/7463267/4628d4537931/41598_2020_71172_Fig1_HTML.jpg

相似文献

A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.

Sci Rep. 2020 Sep 1;10(1):14368. doi: 10.1038/s41598-020-71172-x.

Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2008-2016. doi: 10.1109/TCBB.2020.2966450. Epub 2021 Oct 7.

A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition.

IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):510-9. doi: 10.1109/TCBB.2013.2296317.

Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids.

J Theor Biol. 2016 Aug 7;402:117-28. doi: 10.1016/j.jtbi.2016.05.002. Epub 2016 May 7.

A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition.

IEEE Trans Nanobioscience. 2014 Mar;13(1):44-50. doi: 10.1109/TNB.2013.2296050.

Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping.

J Theor Biol. 2014 Aug 7;354:137-45. doi: 10.1016/j.jtbi.2014.03.033. Epub 2014 Mar 31.

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

Bioinformatics. 2017 Mar 15;33(6):863-870. doi: 10.1093/bioinformatics/btw768.

Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.

Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information.

J Bioinform Comput Biol. 2018 Jun;16(3):1840009. doi: 10.1142/S0219720018400097. Epub 2018 Feb 4.

引用本文的文献

Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization.

Commun Biol. 2025 Mar 29;8(1):517. doi: 10.1038/s42003-025-07902-6.

NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction.

Bioinform Adv. 2023 Oct 16;3(1):vbad151. doi: 10.1093/bioadv/vbad151. eCollection 2023.

BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network.

Int J Mol Sci. 2022 Mar 9;23(6):2966. doi: 10.3390/ijms23062966.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于进化特征的新型融合方法，用于使用支持向量机进行蛋白质折叠识别。

A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献