使用功能域和预测的二级结构序列准确预测蛋白质结构类别。

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences.

机构信息

Department of Computer Science & Engineering, University of South Florida, Tampa, FL 33620, USA.

出版信息

J Biomol Struct Dyn. 2012;29(6):623-33. doi: 10.1080/07391102.2011.672626.

DOI:10.1080/07391102.2011.672626

Abstract

Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.

摘要

蛋白质结构类别预测是生物信息学中的一个具有挑战性的问题。以前的基于氨基酸（AA）序列相似性的直接方法已经被证明对于低相似度的蛋白质数据集是不够的。为了提高对这些低相似度蛋白质的预测准确性，最近提出了不同的方法，这些方法探索了基于预测二级结构倾向的新特征集。在本文中，我们专注于使用包括二级结构倾向以及从 InterPro 签名数据库中提取的功能域（FD）特征的组合来进行蛋白质结构类别预测。我们基于几个基准数据集的综合实验结果表明，新 FD 特征的集成极大地提高了低相似度蛋白质结构类别预测的准确性，因为它们捕获了蛋白质序列中相距很远的 AA 残基之间的有意义的关系。所提出的预测方法还针对部分无序蛋白质的结构类别预测进行了测试，具有合理的预测准确性，与常用基准数据集的结构类别预测相比，这是一个更困难的问题，据我们所知，以前从未做过。此外，为了避免由于特征数量过多而导致的过拟合，应用特征选择来选择有助于实现高预测准确性的判别特征。选择的特征已被证明在不同的基准数据集上具有稳定的预测性能。

相似文献

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences.使用功能域和预测的二级结构序列准确预测蛋白质结构类别。

J Biomol Struct Dyn. 2012;29(6):623-33. doi: 10.1080/07391102.2011.672626.

High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure.基于预测的二级结构对低相似度序列进行蛋白质结构类别高精度预测。

Biochimie. 2011 Apr;93(4):710-4. doi: 10.1016/j.biochi.2011.01.001. Epub 2011 Jan 13.

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。

J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.

Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features.利用新颖的组合序列信息和预测的二级结构特征提高蛋白质结构类别的预测。

J Comput Chem. 2011 Dec;32(16):3393-8. doi: 10.1002/jcc.21918. Epub 2011 Sep 21.

Prediction of protein structural class for the twilight zone sequences.对处于模糊界限区域的序列进行蛋白质结构类别的预测。

Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.

Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM.基于 PSSM 利用主成分分析和支持向量机预测低相似度序列的蛋白质结构类别

J Biomol Struct Dyn. 2012;29(6):634-42. doi: 10.1080/07391102.2011.672627.

A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。

J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.

The prediction of protein structural class using averaged chemical shifts.利用平均化学位移预测蛋白质结构类别。

J Biomol Struct Dyn. 2012;29(6):643-9. doi: 10.1080/07391102.2011.672628.

HYPROSP II--a knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence.HYPROSP II——一种基于局部预测置信度的用于蛋白质二级结构预测的基于知识的混合方法。

Bioinformatics. 2005 Aug 1;21(15):3227-33. doi: 10.1093/bioinformatics/bti524. Epub 2005 Jun 2.

Prediction of protein structural class using a complexity-based distance measure.基于复杂度的距离度量预测蛋白质结构类别。

Amino Acids. 2010 Mar;38(3):721-8. doi: 10.1007/s00726-009-0276-1. Epub 2009 Mar 28.

引用本文的文献

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction.MHTAPred-SS：一种用于准确蛋白质二级结构预测的高度靶向的自动编码器驱动的深度多任务学习框架。

Int J Mol Sci. 2024 Dec 15;25(24):13444. doi: 10.3390/ijms252413444.

Briefing in application of machine learning methods in ion channel prediction.机器学习方法在离子通道预测中的应用简报。

ScientificWorldJournal. 2015;2015:945927. doi: 10.1155/2015/945927. Epub 2015 Apr 16.

Characteristics of protein residue-residue contacts and their application in contact prediction.蛋白质残基-残基接触的特征及其在接触预测中的应用。

J Mol Model. 2014 Nov;20(11):2497. doi: 10.1007/s00894-014-2497-9. Epub 2014 Nov 6.

The structure and dynamics of BmR1 protein from Brugia malayi: in silico approaches.马来布鲁线虫BmR1蛋白的结构与动力学：计算机模拟方法

Int J Mol Sci. 2014 Jun 19;15(6):11082-99. doi: 10.3390/ijms150611082.

Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position.基于内容与位置的预测二级结构统计特征在蛋白质结构类别预测中的比较研究

BMC Bioinformatics. 2013 May 4;14:152. doi: 10.1186/1471-2105-14-152.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用功能域和预测的二级结构序列准确预测蛋白质结构类别。

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献