• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过序列模式挖掘识别蛋白质-蛋白质相互作用中的热点区域

Identification of hot regions in protein-protein interactions by sequential pattern mining.

作者信息

Hsu Chen-Ming, Chen Chien-Yu, Liu Baw-Jhiune, Huang Chih-Chang, Laio Min-Hung, Lin Chien-Chieh, Wu Tzung-Lin

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan, ROC.

出版信息

BMC Bioinformatics. 2007 May 24;8 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-8-S5-S8.

DOI:10.1186/1471-2105-8-S5-S8
PMID:17570867
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1892096/
Abstract

BACKGROUND

Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence.

RESULTS

The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach.

CONCLUSION

This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.

摘要

背景

识别蛋白质相互作用位点是计算分子生物学中的一项重要任务。随着越来越多的蛋白质序列在没有可用结构信息的情况下被存入数据库,仅通过序列来预测蛋白质结合区域的需求变得极为迫切。本文提出了一种模式挖掘方法来解决这一问题。据观察,蛋白质结构的功能区域通常由几个肽段与大的通配符区域相连组成。因此,所提出的挖掘技术在模式增长时考虑大的不规则间隙,以便找到在序列上同时保守但在很大程度上分隔开的残基。由于发现的保守残基总是被分组为几个块,每个块对应于蛋白质序列上的一个局部保守区域,所以导出的模式被称为类簇模式。

结果

在这项工作中进行的实验表明,导出的长模式自动发现了形成蛋白质 - 蛋白质相互作用的一个或几个热点区域的重要残基。基于包含来自72个不同复合物的220条蛋白质链的著名基准,在网络服务器MAGIIC - PRO上进行实验对该方法进行了评估。在所测试的218种蛋白质中,共发现900个连续块,平均每条蛋白质链有4.25个块。观察到约92%的导出块在空间上与至少一个其他块聚集在一起,并且约66%的块位于蛋白质 - 蛋白质相互作用的界面附近。总结得出,对于约83%的测试蛋白质,通过这种方法至少可以发现两个相互作用块。

结论

这项工作旨在证明通过序列模式挖掘可以自动发现与蛋白质 - 蛋白质相互作用界面相关的重要残基。检测到的区域具有高度保守性,因此被视为计算热点区域。这些信息对于表征蛋白质序列、预测蛋白质功能、寻找潜在伙伴以及促进用于药物发现的蛋白质对接将是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff10/1892096/942418199ad9/1471-2105-8-S5-S8-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff10/1892096/942418199ad9/1471-2105-8-S5-S8-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff10/1892096/942418199ad9/1471-2105-8-S5-S8-6.jpg

相似文献

1
Identification of hot regions in protein-protein interactions by sequential pattern mining.通过序列模式挖掘识别蛋白质-蛋白质相互作用中的热点区域
BMC Bioinformatics. 2007 May 24;8 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-8-S5-S8.
2
MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.MAGIIC-PRO:通过高效发现蛋白质序列中的长模式来检测功能特征。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W356-61. doi: 10.1093/nar/gkl309.
3
WildSpan: mining structured motifs from protein sequences.WildSpan:从蛋白质序列中挖掘结构化基序
Algorithms Mol Biol. 2011 Mar 31;6(1):6. doi: 10.1186/1748-7188-6-6.
4
MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.MAGIIC-PRO:通过高效发现蛋白质序列中的长模式来检测功能特征。
Nucleic Acids Res. 2008 Mar;36(4):1400-6. doi: 10.1093/nar/gkm717.
5
Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale.在全蛋白质组范围内从蛋白质序列的相互作用位点发现基序对。
Bioinformatics. 2006 Apr 15;22(8):989-96. doi: 10.1093/bioinformatics/btl020. Epub 2006 Jan 29.
6
Using structural motif descriptors for sequence-based binding site prediction.使用结构基序描述符进行基于序列的结合位点预测。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-8-S4-S5.
7
Context-based identification of protein-protein interfaces and "hot-spot" residues.基于上下文的蛋白质-蛋白质界面和“热点”残基的识别。
Chem Biol. 2011 Mar 25;18(3):344-53. doi: 10.1016/j.chembiol.2011.01.005.
8
3D-partner: a web server to infer interacting partners and binding models.3D伙伴:一个用于推断相互作用伙伴和结合模型的网络服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W561-7. doi: 10.1093/nar/gkm346. Epub 2007 May 21.
9
Blind predictions of protein interfaces by docking calculations in CAPRI.通过 CAPRI 中的对接计算对蛋白质界面进行盲预测。
Proteins. 2010 Nov 15;78(15):3085-95. doi: 10.1002/prot.22850.
10
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

引用本文的文献

1
Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: An application to rugby union.对体育事件序列进行有监督的序贯模式挖掘,以识别重要的比赛模式:以橄榄球为例。
PLoS One. 2021 Sep 23;16(9):e0256329. doi: 10.1371/journal.pone.0256329. eCollection 2021.
2
ECMIS: computational approach for the identification of hotspots at protein-protein interfaces.ECMIS:用于识别蛋白质-蛋白质界面热点的计算方法。
BMC Bioinformatics. 2014 Sep 16;15(1):303. doi: 10.1186/1471-2105-15-303.
3
Discovering beaten paths in collaborative ontology-engineering projects using Markov chains.

本文引用的文献

1
Protein binding site prediction using an empirical scoring function.使用经验评分函数预测蛋白质结合位点。
Nucleic Acids Res. 2006 Aug 7;34(13):3698-707. doi: 10.1093/nar/gkl454. Print 2006.
2
MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.MAGIIC-PRO:通过高效发现蛋白质序列中的长模式来检测功能特征。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W356-61. doi: 10.1093/nar/gkl309.
3
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.
利用马尔可夫链在协作本体工程中发现常规路径。
J Biomed Inform. 2014 Oct;51:254-71. doi: 10.1016/j.jbi.2014.06.004. Epub 2014 Jun 17.
4
Efficient mining gapped sequential patterns for motifs in biological sequences.用于挖掘生物序列中基序的带间隙序列模式的高效挖掘方法。
BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S7. doi: 10.1186/1752-0509-7-S4-S7. Epub 2013 Oct 23.
5
Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization.利用热点能量和组织来描述界面突变时蛋白质-蛋白质解离速率的变化。
PLoS Comput Biol. 2013;9(9):e1003216. doi: 10.1371/journal.pcbi.1003216. Epub 2013 Sep 5.
6
Predicting DNA-binding locations and orientation on proteins using knowledge-based learning of geometric properties.利用基于知识的几何性质学习来预测蛋白质上的 DNA 结合位置和取向。
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S11. doi: 10.1186/1477-5956-9-S1-S11.
7
WildSpan: mining structured motifs from protein sequences.WildSpan:从蛋白质序列中挖掘结构化基序
Algorithms Mol Biol. 2011 Mar 31;6(1):6. doi: 10.1186/1748-7188-6-6.
8
Predicting RNA-binding residues from evolutionary information and sequence conservation.从进化信息和序列保守性预测 RNA 结合残基。
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2164-11-S4-S2.
9
PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces.PCRPi:预测蛋白质界面关键残基的新计算工具,用于绘制蛋白质界面热点。
Nucleic Acids Res. 2010 Apr;38(6):e86. doi: 10.1093/nar/gkp1158. Epub 2009 Dec 11.
10
seeMotif: exploring and visualizing sequence motifs in 3D structures.seeMotif:探索和可视化三维结构中的序列基序
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W552-8. doi: 10.1093/nar/gkp439. Epub 2009 May 28.
Cd-hit:一个用于对大量蛋白质或核苷酸序列进行聚类和比较的快速程序。
Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.
4
Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces.预测蛋白质相互作用位点:蛋白质-蛋白质和蛋白质-配体界面中的结合热点
Bioinformatics. 2006 Jun 1;22(11):1335-42. doi: 10.1093/bioinformatics/btl079. Epub 2006 Mar 7.
5
Probabilistic model of the human protein-protein interaction network.人类蛋白质-蛋白质相互作用网络的概率模型
Nat Biotechnol. 2005 Aug;23(8):951-9. doi: 10.1038/nbt1103.
6
Protein-Protein Docking Benchmark 2.0: an update.蛋白质-蛋白质对接基准2.0:更新版
Proteins. 2005 Aug 1;60(2):214-6. doi: 10.1002/prot.20560.
7
ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures.ConSurf 2005:蛋白质结构上残基进化保守性得分的投影
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W299-302. doi: 10.1093/nar/gki370.
8
SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs.SCANMOT:通过同时扫描多个序列基序来搜索相似序列。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W274-6. doi: 10.1093/nar/gki493.
9
An evolution based classifier for prediction of protein interfaces without using protein structures.一种无需使用蛋白质结构即可预测蛋白质界面的基于进化的分类器。
Bioinformatics. 2005 May 15;21(10):2496-501. doi: 10.1093/bioinformatics/bti340. Epub 2005 Feb 22.
10
Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues.蛋白质-蛋白质相互作用中的热点区域:结构保守热点残基的组织与贡献
J Mol Biol. 2005 Feb 4;345(5):1281-94. doi: 10.1016/j.jmb.2004.10.077. Epub 2004 Dec 2.