• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用有效的特征建模和分类器集成增强蛋白质结构类预测。

Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.

DOI:10.1109/TCBB.2020.2979430
PMID:32149653
Abstract

Protein Secondary Structural Class (PSSC) information is important in investigating further challenges of protein sequences like protein fold recognition, protein tertiary structure prediction, and analysis of protein functions for drug discovery. Identification of PSSC using biological methods is time-consuming and cost-intensive. Several computational models have been developed to predict the structural class; however, they lack in generalization of the model. Hence, predicting PSSC based on protein sequences is still proving to be an uphill task. In this article, we proposed an effective, novel and generalized prediction model consisting of a feature modeling and an ensemble of classifiers. The proposed feature modeling extracts discriminating information (features) by leveraging three techniques: (i) Embedding - features are extracted on the basis of spatial residue arrangements of the sequences using word embedding approaches; (ii) SkipXGram Bi-gram - various sets of skipped bi-gram features are extracted from the sequences; and (iii) General Statistical (GS) based features are extracted which covers the global information of structural sequences. The combined effective sets of features are trained and classified using an ensemble of three classifiers: Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machines (GBM). The proposed model when assessed on five benchmark datasets (high and low sequence similarity), viz. z277, z498, 25PDB, 1189, and FC699, reported an overall accuracy of 93.55, 97.58, 81.82, 81.11, and 93.93 percent respectively. The proposed model is further validated on a large-scale updated low similarity ( ≤ 25%) dataset, where it achieved an overall accuracy of 81.11 percent. The proposed generalized model is robust and consistently outperformed several state-of-the-art models on all the five benchmark datasets.

摘要

蛋白质二级结构类别 (PSSC) 信息对于研究蛋白质序列的进一步挑战(如蛋白质折叠识别、蛋白质三级结构预测以及药物发现中的蛋白质功能分析)非常重要。使用生物学方法鉴定 PSSC 既耗时又昂贵。已经开发了几种计算模型来预测结构类别;然而,它们缺乏模型的泛化能力。因此,基于蛋白质序列预测 PSSC 仍然是一项具有挑战性的任务。在本文中,我们提出了一种有效、新颖且通用的预测模型,该模型由特征建模和分类器集合组成。所提出的特征建模通过利用三种技术来提取区分信息(特征):(i)嵌入 - 根据序列的空间残基排列使用词嵌入方法提取特征;(ii)SkipXGram 双元 - 从序列中提取各种跳过双元特征集;以及(iii)基于广义统计(GS)的特征,提取涵盖结构序列全局信息的特征。使用三个分类器(支持向量机 (SVM)、随机森林 (RF) 和梯度提升机 (GBM))对组合的有效特征集进行训练和分类。所提出的模型在五个基准数据集(高和低序列相似性),即 z277、z498、25PDB、1189 和 FC699 上进行评估时,分别报告了 93.55%、97.58%、81.82%、81.11%和 93.93%的总体准确性。该模型进一步在大型更新的低相似度(≤25%)数据集上进行验证,在该数据集上,它实现了 81.11%的总体准确性。所提出的通用模型稳健且在所有五个基准数据集上都始终优于几个最先进的模型。

相似文献

1
Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.利用有效的特征建模和分类器集成增强蛋白质结构类预测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.
2
Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.利用从预测二级结构的混沌博弈表示中提取的新特征方法对蛋白质进行结构类预测。
J Theor Biol. 2016 Jul 7;400:1-10. doi: 10.1016/j.jtbi.2016.04.011. Epub 2016 Apr 12.
3
High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure.基于预测的二级结构对低相似度序列进行蛋白质结构类别高精度预测。
Biochimie. 2011 Apr;93(4):710-4. doi: 10.1016/j.biochi.2011.01.001. Epub 2011 Jan 13.
4
Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm.基于一级和二级结构特征,使用随机森林算法预测相似度为 40%的蛋白质序列的结构类别。
Comput Biol Chem. 2020 Feb;84:107164. doi: 10.1016/j.compbiolchem.2019.107164. Epub 2019 Nov 15.
5
Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination.利用位置特异性评分矩阵的三元概率和递归特征消除预测蛋白质结构类别。
Amino Acids. 2015 Mar;47(3):461-8. doi: 10.1007/s00726-014-1878-9. Epub 2015 Jan 13.
6
Novel structure-driven features for accurate prediction of protein structural class.用于准确预测蛋白质结构类别的新型结构驱动特征。
Genomics. 2014 Apr;103(4):292-7. doi: 10.1016/j.ygeno.2014.04.002. Epub 2014 Apr 18.
7
A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.一种使用自动交叉协方差变换和递归特征消除的高精度蛋白质结构类别预测方法。
Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.
8
Prediction of protein structural class based on symmetrical recurrence quantification analysis.基于对称递归定量分析的蛋白质结构类预测。
Comput Biol Chem. 2021 Jun;92:107450. doi: 10.1016/j.compbiolchem.2021.107450. Epub 2021 Feb 8.
9
Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity.提高蛋白质结构类别的预测准确性:采用交替词频和归一化莱姆尔-齐夫复杂度的方法。
J Theor Biol. 2014 Jan 21;341:71-7. doi: 10.1016/j.jtbi.2013.10.002. Epub 2013 Oct 17.
10
Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM.基于 PSSM 利用主成分分析和支持向量机预测低相似度序列的蛋白质结构类别
J Biomol Struct Dyn. 2012;29(6):634-42. doi: 10.1080/07391102.2011.672627.