• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用序列描述符和相邻氨基酸的位点倾向预测蛋白质-蛋白质相互作用位点

Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids.

作者信息

Kuo Tzu-Hao, Li Kuo-Bin

机构信息

Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.

Office of Information Management, National Yang-Ming University Hospital, Yilan 260, Taiwan.

出版信息

Int J Mol Sci. 2016 Oct 26;17(11):1788. doi: 10.3390/ijms17111788.

DOI:10.3390/ijms17111788
PMID:27792167
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5133789/
Abstract

Information about the interface sites of Protein-Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.

摘要

蛋白质-蛋白质相互作用(PPI)界面位点的信息对许多生物学研究工作都很有用。然而,尽管实验技术不断进步,但PPI位点的识别仍然是一项具有挑战性的任务。我们使用统计学习技术,提出了一种预测PPI相互作用位点的计算工具。作为需要结构信息的类似方法的替代方案,该方法从蛋白质序列获取所有输入。除了典型的序列特征外,我们的方法还考虑到相互作用位点并非随机分布在蛋白质序列上。我们利用具有已知结构的蛋白质复合物来表征这种位置偏好,提出了一个数值指标来估计倾向,然后将该指标纳入学习系统。在十折交叉验证实验中,所得的预测器在不使用结构信息的情况下,ROC曲线下面积(AUC)为0.675,召回率为0.597,精确率为0.311,准确率为0.583。这一性能与之前使用结构信息的方法相当。在将B因子数据引入我们的预测器后,我们证明AUC可以进一步提高到0.750。该工具可在http://bsaltools.ym.edu.tw/predppis获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/fe2fb79a7c3c/ijms-17-01788-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/ddb4066894d2/ijms-17-01788-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/8a122b70bc59/ijms-17-01788-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/72fd4df72e65/ijms-17-01788-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/9bd23047469e/ijms-17-01788-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/80085192499a/ijms-17-01788-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/fe2fb79a7c3c/ijms-17-01788-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/ddb4066894d2/ijms-17-01788-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/8a122b70bc59/ijms-17-01788-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/72fd4df72e65/ijms-17-01788-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/9bd23047469e/ijms-17-01788-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/80085192499a/ijms-17-01788-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/155a/5133789/fe2fb79a7c3c/ijms-17-01788-g006.jpg

相似文献

1
Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids.利用序列描述符和相邻氨基酸的位点倾向预测蛋白质-蛋白质相互作用位点
Int J Mol Sci. 2016 Oct 26;17(11):1788. doi: 10.3390/ijms17111788.
2
Using structural motif descriptors for sequence-based binding site prediction.使用结构基序描述符进行基于序列的结合位点预测。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-8-S4-S5.
3
Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods.植物-病原体蛋白-蛋白相互作用预测方法的关键评估和性能改进。
Brief Bioinform. 2019 Jan 18;20(1):274-287. doi: 10.1093/bib/bbx123.
4
Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.基于机器学习的数据清理和后过滤程序预测蛋白质-蛋白质相互作用位点
J Membr Biol. 2016 Apr;249(1-2):141-53. doi: 10.1007/s00232-015-9856-z. Epub 2015 Nov 12.
5
Identification of Protein-Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information.通过一种基于矩阵且包含氨基酸接触信息的新型序列表示模型鉴定蛋白质-蛋白质相互作用。
Int J Mol Sci. 2016 Sep 24;17(10):1623. doi: 10.3390/ijms17101623.
6
Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.通过集成学习方法预测人类与丙型肝炎病毒之间的蛋白质-蛋白质相互作用。
Mol Biosyst. 2014 Dec;10(12):3147-54. doi: 10.1039/c4mb00410h. Epub 2014 Sep 18.
7
A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.一种用于预测蛋白质-蛋白质相互作用位点的级联随机森林算法。
IEEE Trans Nanobioscience. 2015 Oct;14(7):746-60. doi: 10.1109/TNB.2015.2475359. Epub 2015 Sep 28.
8
SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.SPOTONE:基于序列特征的极度随机化树的蛋白质复合物热点。
Int J Mol Sci. 2020 Oct 1;21(19):7281. doi: 10.3390/ijms21197281.
9
MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles.MoRFPred-plus:利用物理化学性质和隐马尔可夫模型轮廓对蛋白质序列中的分子识别特征进行计算识别
J Theor Biol. 2018 Jan 21;437:9-16. doi: 10.1016/j.jtbi.2017.10.015. Epub 2017 Oct 16.
10
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.基于伪氨基酸组成和混合特征选择的蛋白质-蛋白质相互作用预测
Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22. doi: 10.1016/j.bbrc.2009.01.077. Epub 2009 Jan 24.

引用本文的文献

1
A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites.一种基于深度学习和XGBoost的蛋白质-蛋白质相互作用位点预测方法。
Front Genet. 2021 Oct 26;12:752732. doi: 10.3389/fgene.2021.752732. eCollection 2021.
2
Next Generation Techniques for Determination of Protein-Protein Interactions: Beyond the Crystal Structure.蛋白质-蛋白质相互作用测定的下一代技术:超越晶体结构
Curr Pathobiol Rep. 2019 Sep;7(3):61-71. doi: 10.1007/s40139-019-00198-2. Epub 2019 Jul 1.
3
Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm.

本文引用的文献

1
Inferring interaction partners from protein sequences.从蛋白质序列推断相互作用伙伴。
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185. doi: 10.1073/pnas.1606762113. Epub 2016 Sep 23.
2
iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition.iOri-Human:通过将二核苷酸物理化学性质纳入伪核苷酸组成来识别人类复制起点。
Oncotarget. 2016 Oct 25;7(43):69783-69793. doi: 10.18632/oncotarget.11975.
3
iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier.
基于 XGBoost 算法开发用于预测蛋白质-蛋白质相互作用位点的计算模型。
Int J Mol Sci. 2020 Mar 25;21(7):2274. doi: 10.3390/ijms21072274.
4
Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets.利用卷积神经网络和改进数据集预测蛋白质-蛋白质相互作用位点。
Int J Mol Sci. 2020 Jan 11;21(2):467. doi: 10.3390/ijms21020467.
5
Evolution of In Silico Strategies for Protein-Protein Interaction Drug Discovery.基于计算机的策略在蛋白质-蛋白质相互作用药物研发中的发展。
Molecules. 2018 Aug 6;23(8):1963. doi: 10.3390/molecules23081963.
iPhos-PseEn:通过将不同的伪组分融合到集成分类器中来识别蛋白质中的磷酸化位点。
Oncotarget. 2016 Aug 9;7(32):51270-51283. doi: 10.18632/oncotarget.9987.
4
iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC.iHyd-PseCp:通过将序列耦合效应纳入通用伪氨基酸组成来鉴定蛋白质中的羟脯氨酸和羟赖氨酸。
Oncotarget. 2016 Jul 12;7(28):44310-44321. doi: 10.18632/oncotarget.10027.
5
iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework.iDHS-EL:通过将三种不同模式的伪核苷酸组成融合到一个集成学习框架中,来识别 DNase I 超敏位点。
Bioinformatics. 2016 Aug 15;32(16):2411-8. doi: 10.1093/bioinformatics/btw186. Epub 2016 Apr 8.
6
iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC.iCar-PseCp:通过蒙特卡洛采样识别蛋白质中的羰基化位点,并将序列耦合效应纳入通用伪氨基酸组成中。
Oncotarget. 2016 Jun 7;7(23):34558-70. doi: 10.18632/oncotarget.9148.
7
iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition.iROS-gPseKNC:通过将二核苷酸位置特异性倾向纳入通用伪核苷酸组成来预测DNA中的复制起始位点。
Oncotarget. 2016 Jun 7;7(23):34180-9. doi: 10.18632/oncotarget.9057.
8
iACP: a sequence-based tool for identifying anticancer peptides.iACP:一种用于鉴定抗癌肽的基于序列的工具。
Oncotarget. 2016 Mar 29;7(13):16895-909. doi: 10.18632/oncotarget.7815.
9
Characterization of clinical signs in the human interactome.人类相互作用组中临床症状的特征描述。
Bioinformatics. 2016 Jun 15;32(12):1761-5. doi: 10.1093/bioinformatics/btw054. Epub 2016 Feb 9.
10
pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach.pSuc-Lys:利用伪氨基酸组成和集成随机森林方法预测蛋白质中的赖氨酸琥珀酰化位点。
J Theor Biol. 2016 Apr 7;394:223-230. doi: 10.1016/j.jtbi.2016.01.020. Epub 2016 Jan 22.