• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于最小协方差行列式和马氏距离的蛋白质-蛋白质相互作用位点预测

Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance.

作者信息

Qiu Zhijun, Zhou Bo, Yuan Jiangfeng

机构信息

College of Food and Bioengineering, Henan University of Science and Technology, 263 Kai-Yuan Road, Luoyang, 471023, China.

College of Food and Bioengineering, Henan University of Science and Technology, 263 Kai-Yuan Road, Luoyang, 471023, China.

出版信息

J Theor Biol. 2017 Nov 21;433:57-63. doi: 10.1016/j.jtbi.2017.08.026. Epub 2017 Sep 1.

DOI:10.1016/j.jtbi.2017.08.026
PMID:28867223
Abstract

Protein-protein interaction site (PPIS) prediction must deal with the diversity of interaction sites that limits their prediction accuracy. Use of proteins with unknown or unidentified interactions can also lead to missing interfaces. Such data errors are often brought into the training dataset. In response to these two problems, we used the minimum covariance determinant (MCD) method to refine the training data to build a predictor with better performance, utilizing its ability of removing outliers. In order to predict test data in practice, a method based on Mahalanobis distance was devised to select proper test data as input for the predictor. With leave-one-validation and independent test, after the Mahalanobis distance screening, our method achieved higher performance according to Matthews correlation coefficient (MCC), although only a part of test data could be predicted. These results indicate that data refinement is an efficient approach to improve protein-protein interaction site prediction. By further optimizing our method, it is hopeful to develop predictors of better performance and wide range of application.

摘要

蛋白质-蛋白质相互作用位点(PPIS)预测必须应对相互作用位点的多样性,这种多样性限制了其预测准确性。使用具有未知或未识别相互作用的蛋白质也可能导致遗漏界面。此类数据错误经常被带入训练数据集。针对这两个问题,我们使用最小协方差行列式(MCD)方法对训练数据进行细化,以构建性能更好的预测器,利用其去除异常值的能力。为了在实际中预测测试数据,设计了一种基于马氏距离的方法来选择合适的测试数据作为预测器的输入。通过留一法验证和独立测试,经过马氏距离筛选后,尽管只能预测一部分测试数据,但我们的方法根据马修斯相关系数(MCC)取得了更高的性能。这些结果表明,数据细化是提高蛋白质-蛋白质相互作用位点预测的有效方法。通过进一步优化我们的方法,有望开发出性能更好、应用范围更广的预测器。

相似文献

1
Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance.基于最小协方差行列式和马氏距离的蛋白质-蛋白质相互作用位点预测
J Theor Biol. 2017 Nov 21;433:57-63. doi: 10.1016/j.jtbi.2017.08.026. Epub 2017 Sep 1.
2
Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique.使用迭代半监督学习技术在计算机上准确识别蛋白质琥珀酰化位点
J Theor Biol. 2015 Jun 7;374:60-5. doi: 10.1016/j.jtbi.2015.03.029. Epub 2015 Apr 2.
3
Protein-protein interaction site prediction using random forest proximity distance.基于随机森林邻近距离的蛋白质-蛋白质相互作用位点预测。
J Bioinform Comput Biol. 2021 Feb;19(1):2050042. doi: 10.1142/S0219720020500420. Epub 2020 Nov 19.
4
Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor.瞬态蛋白质-蛋白质相互作用预测:数据集、特征、算法和 RAD-T 预测器。
BMC Bioinformatics. 2014 Mar 24;15:82. doi: 10.1186/1471-2105-15-82.
5
Detection of outlier residues for improving interface prediction in protein heterocomplexes.检测异常残基以改善蛋白质杂合体界面预测。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1155-65. doi: 10.1109/TCBB.2012.58.
6
Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions.DNA结合位点的残基水平预测及其在DNA结合蛋白预测中的应用。
FEBS Lett. 2007 Mar 6;581(5):1058-66. doi: 10.1016/j.febslet.2007.01.086. Epub 2007 Feb 7.
7
Mahalanobis distances for ecological niche modelling and outlier detection: implications of sample size, error, and bias for selecting and parameterising a multivariate location and scatter method.用于生态位建模和异常值检测的马氏距离:样本量、误差和偏差对选择和参数化多元位置与离散方法的影响
PeerJ. 2021 May 11;9:e11436. doi: 10.7717/peerj.11436. eCollection 2021.
8
Prediction-based fingerprints of protein-protein interactions.基于预测的蛋白质-蛋白质相互作用指纹图谱。
Proteins. 2007 Feb 15;66(3):630-45. doi: 10.1002/prot.21248.
9
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.利用核苷酸谱和组成预测RNA中的蛋白质结合区域。
BMC Syst Biol. 2017 Mar 14;11(Suppl 2):16. doi: 10.1186/s12918-017-0386-4.
10
Predicting protein-binding RNA nucleotides with consideration of binding partners.考虑结合伙伴预测与蛋白质结合的 RNA 核苷酸。
Comput Methods Programs Biomed. 2015 Jun;120(1):3-15. doi: 10.1016/j.cmpb.2015.03.010. Epub 2015 Apr 8.

引用本文的文献

1
Predicting subcellular localization of multisite proteins using differently weighted multi-label k-nearest neighbors sets.使用不同加权的多标签k近邻集预测多位点蛋白质的亚细胞定位。
Technol Health Care. 2019;27(S1):185-193. doi: 10.3233/THC-199018.
2
Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique.基于集成随机森林和合成少数过采样技术的蛋白质-蛋白质相互作用位点预测。
Bioinformatics. 2019 Jul 15;35(14):2395-2402. doi: 10.1093/bioinformatics/bty995.