• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于序列的混合预测器,用于鉴定蛋白质中构象不稳定区域。

A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.

机构信息

Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan, Republic of China.

出版信息

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S22. doi: 10.1186/1471-2164-10-S3-S22.

DOI:10.1186/1471-2164-10-S3-S22
PMID:19958486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2788375/
Abstract

BACKGROUND

Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research.

RESULTS

In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the high-sensitivity mode and the high-specificity mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the high-sensitivity mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the high-specificity mode.

CONCLUSION

Though experimental results show that the hybrid approach designed to exploit the complementary prediction powers of distinctive supervised learning algorithms works more effectively than conventional approaches, there exists a large room for further improvement with respect to the achieved performance. In this respect, it is of interest to investigate the effects of exploiting additional physiochemical properties that are related to conformational ambivalence. Furthermore, it is of interest to investigate the effects of incorporating lately-developed machine learning approaches, e.g. the random forest design and the multi-stage design. As conformational transition plays a key role in carrying out several essential types of biological functions, the design of more advanced predictors for identifying conformationally ambivalent regions in proteins deserves our continuous attention.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/c13545dfaa4a/1471-2164-10-S3-S22-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/c3dc7eead144/1471-2164-10-S3-S22-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/b1ec426d1d09/1471-2164-10-S3-S22-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/fd8a97f6b62b/1471-2164-10-S3-S22-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/c13545dfaa4a/1471-2164-10-S3-S22-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/c3dc7eead144/1471-2164-10-S3-S22-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/b1ec426d1d09/1471-2164-10-S3-S22-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/fd8a97f6b62b/1471-2164-10-S3-S22-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5951/2788375/c13545dfaa4a/1471-2164-10-S3-S22-4.jpg
摘要

背景

蛋白质是动态的大分子,其构象在环境变化时可能会发生转变。由于在实验室中观察到蛋白质的柔韧性与基本的生物功能相关,科学家们一直在设计各种类型的预测器,以识别蛋白质中的结构柔性区域。在这方面,有两种主要的预测器类别。一类预测器试图通过分析蛋白质的三级结构来识别构象柔性区域。另一类预测器则完全基于多肽序列的分析。由于蛋白质三级结构的可用性通常有限,因此设计完全基于序列信息的预测器对于推进分子生物学研究至关重要。

结果

在本文中,我们提出了一种新的方法,用于设计一种基于序列的预测器,以识别蛋白质中的构象不确定区域。该设计的新颖之处在于,它结合了两种基于两种不同监督学习算法的分类器,提供了互补的预测能力。实验结果表明,本文提出的混合预测器的整体性能优于现有的预测器。此外,本文的案例研究表明,所提出的混合预测器能够为生物学家提供有关蛋白质链中功能位点的有价值线索。所提出的混合预测器为用户提供了两种可选模式,即高灵敏度模式和高特异性模式。使用独立测试数据集的实验结果表明,在高灵敏度模式下,该混合预测器的灵敏度为 0.710,特异性为 0.608,而在高特异性模式下,灵敏度为 0.451,特异性为 0.787。

结论

尽管实验结果表明,利用不同监督学习算法的互补预测能力的混合方法比传统方法更有效,但在已取得的性能方面仍有很大的改进空间。在这方面,研究利用与构象不确定性相关的其他物理化学性质的效果很有意义。此外,研究利用最近开发的机器学习方法,例如随机森林设计和多阶段设计的效果也很有意义。由于构象转变在执行几种基本类型的生物功能中起着关键作用,因此设计更先进的预测器来识别蛋白质中的构象不确定区域值得我们持续关注。

相似文献

1
A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.基于序列的混合预测器,用于鉴定蛋白质中构象不稳定区域。
BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S22. doi: 10.1186/1471-2164-10-S3-S22.
2
Identifying sequence regions undergoing conformational change via predicted continuum secondary structure.通过预测的连续二级结构识别经历构象变化的序列区域。
Bioinformatics. 2006 Aug 1;22(15):1809-14. doi: 10.1093/bioinformatics/btl198. Epub 2006 May 23.
3
Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic.挖掘异质特征以提高肽状态(淀粉样变性或非淀粉样变性)的计算预测。
BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S21. doi: 10.1186/1471-2105-12-S13-S21. Epub 2011 Nov 30.
4
Predicting protein disorder by analyzing amino acid sequence.通过分析氨基酸序列预测蛋白质无序状态。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-9-S2-S8.
5
Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties.利用随机森林和氨基酸理化性质进行蛋白质结构域间连接子预测。
BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S8. doi: 10.1186/1471-2105-15-S16-S8. Epub 2014 Dec 8.
6
DomNet: protein domain boundary prediction using enhanced general regression network and new profiles.DomNet:使用增强型通用回归网络和新轮廓进行蛋白质结构域边界预测
IEEE Trans Nanobioscience. 2008 Jun;7(2):172-81. doi: 10.1109/TNB.2008.2000747.
7
Prediction of protein continuum secondary structure with probabilistic models based on NMR solved structures.基于核磁共振解析结构的概率模型预测蛋白质连续二级结构。
BMC Bioinformatics. 2006 Feb 14;7:68. doi: 10.1186/1471-2105-7-68.
8
Prediction of disordered regions in proteins based on the meta approach.基于元方法预测蛋白质中的无序区域。
Bioinformatics. 2008 Jun 1;24(11):1344-8. doi: 10.1093/bioinformatics/btn195. Epub 2008 Apr 20.
9
HingeMaster: normal mode hinge prediction approach and integration of complementary predictors.HingeMaster:正常模式铰链预测方法及互补预测器的整合。
Proteins. 2008 Nov 1;73(2):299-319. doi: 10.1002/prot.22060.
10
Predicting microRNA precursors with a generalized Gaussian components based density estimation algorithm.基于广义高斯成分密度估计算法预测 microRNA 前体。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S52. doi: 10.1186/1471-2105-11-S1-S52.

引用本文的文献

1
Using several pseudo amino acid composition types and different machine learning algorithms to classify and predict archaeal phospholipases.使用多种伪氨基酸组成类型和不同的机器学习算法对古菌磷脂酶进行分类和预测。
Mol Biol Res Commun. 2023;12(3):117-126. doi: 10.22099/mbrc.2023.47756.1845.
2
Extending Asia Pacific bioinformatics into new realms in the "-omics" era.将亚太生物信息学拓展到“组学”时代的新领域。
BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2164-10-S3-S1.

本文引用的文献

1
FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins.FlexPred:一个用于预测蛋白质构象转换中涉及的残基位置的网络服务器。
Bioinformation. 2008;3(3):134-6. doi: 10.6026/97320630003134. Epub 2008 Nov 5.
2
Computational structural analysis: multiple proteins bound to DNA.计算结构分析:与DNA结合的多种蛋白质
PLoS One. 2008 Sep 19;3(9):e3243. doi: 10.1371/journal.pone.0003243.
3
Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data.
蛋白质主链中有序的构象变化:从序列和低分辨率结构数据预测构象可变位点。
Proteins. 2008 Jul;72(1):74-87. doi: 10.1002/prot.21899.
4
Evaluating conformational changes in protein structures binding RNA.评估与RNA结合的蛋白质结构中的构象变化。
Proteins. 2008 Mar;70(4):1518-26. doi: 10.1002/prot.21647.
5
POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions.POODLE-L:一种用于可靠预测长无序区域的两级支持向量机预测系统。
Bioinformatics. 2007 Aug 15;23(16):2046-53. doi: 10.1093/bioinformatics/btm302. Epub 2007 Jun 1.
6
How different are structurally flexible and rigid binding sites? Sequence and structural features discriminating proteins that do and do not undergo conformational change upon ligand binding.结构灵活和刚性的结合位点有何不同?区分在配体结合时发生和不发生构象变化的蛋白质的序列和结构特征。
J Mol Biol. 2007 Jan 5;365(1):257-73. doi: 10.1016/j.jmb.2006.09.062. Epub 2006 Sep 29.
7
Design of protein conformational switches.蛋白质构象开关的设计
Curr Opin Struct Biol. 2006 Aug;16(4):525-30. doi: 10.1016/j.sbi.2006.05.014. Epub 2006 Jun 12.
8
Identifying sequence regions undergoing conformational change via predicted continuum secondary structure.通过预测的连续二级结构识别经历构象变化的序列区域。
Bioinformatics. 2006 Aug 1;22(15):1809-14. doi: 10.1093/bioinformatics/btl198. Epub 2006 May 23.
9
Length-dependent prediction of protein intrinsic disorder.蛋白质内在无序性的长度依赖性预测。
BMC Bioinformatics. 2006 Apr 17;7:208. doi: 10.1186/1471-2105-7-208.
10
Prediction of protein continuum secondary structure with probabilistic models based on NMR solved structures.基于核磁共振解析结构的概率模型预测蛋白质连续二级结构。
BMC Bioinformatics. 2006 Feb 14;7:68. doi: 10.1186/1471-2105-7-68.