检测异常残基以改善蛋白质杂合体界面预测。

Detection of outlier residues for improving interface prediction in protein heterocomplexes.

机构信息

Institute of Intelligent Machines, Chinese Academy of Sciences, PO Box 1130, Hefei 230031, China.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1155-65. doi: 10.1109/TCBB.2012.58.

DOI:10.1109/TCBB.2012.58

Abstract

Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.

摘要

基于序列的蛋白质结合界面理解和识别是一个具有挑战性的研究课题，这是由于蛋白质系统的复杂性以及界面和非界面残基之间的不平衡分布所致。本文提出了一种异常值检测思想来解决蛋白质相互作用数据中的冗余问题。然后，使用清理后的训练数据来提高预测性能。我们使用三个新的度量标准来描述与同一类标签的所有残基实例的中心实例相比，一个残基实例被视为异常值的程度：残基实例与所有残基实例的中心实例的距离（Dist）、残基实例的类标签的概率（PCL）以及类内和类间残基实例的重要性（IWB）。异常值得分通过整合这三个因素来计算；得分足够大的实例被视为异常值并被删除。没有异常值的数据集被用作支持向量机（SVM）集成的输入。在没有异常值的输入数据上训练的 SVM 集成比有异常值的 SVM 集成表现更好。我们的方法在基准数据集上也比许多文献方法更准确。从我们的实证研究中，我们发现一些异常值界面残基确实接近非界面区域，而一些异常值非界面残基接近界面区域。

相似文献

Detection of outlier residues for improving interface prediction in protein heterocomplexes.检测异常残基以改善蛋白质杂合体界面预测。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1155-65. doi: 10.1109/TCBB.2012.58.

Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.探索 3D Zernike 描述符和 SVM 在蛋白质-蛋白质界面预测中的应用潜力。

BMC Bioinformatics. 2018 Feb 6;19(1):35. doi: 10.1186/s12859-018-2043-3.

Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information.基于序列信息，使用拉普拉斯支持向量机预测蛋白质中的微小RNA结合残基。

J Bioinform Comput Biol. 2018 Jun;16(3):1840009. doi: 10.1142/S0219720018400097. Epub 2018 Feb 4.

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.EL_PSSM-RT：通过整合集成学习与PSSM关系转换进行DNA结合残基预测

BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.

Prediction of protein-protein interaction sites using support vector machines.使用支持向量机预测蛋白质-蛋白质相互作用位点。

Protein Eng Des Sel. 2004 Feb;17(2):165-73. doi: 10.1093/protein/gzh020. Epub 2004 Jan 20.

Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines.使用代价敏感拉普拉斯支持向量机基于序列预测蛋白质中的微小RNA结合残基。

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):752-9. doi: 10.1109/TCBB.2013.75.

Sequence-based prediction of protein interaction sites with an integrative method.基于序列的蛋白质相互作用位点的综合预测方法。

Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.

Signal peptide discrimination and cleavage site identification using SVM and NN.使用 SVM 和 NN 进行信号肽识别和切割位点鉴定。

Comput Biol Med. 2014 Feb;45:98-110. doi: 10.1016/j.compbiomed.2013.11.017. Epub 2013 Dec 1.

MOWGLI: prediction of protein-MannOse interacting residues With ensemble classifiers usinG evoLutionary Information.MOWGLI：利用进化信息通过集成分类器预测蛋白质-甘露糖相互作用残基

J Biomol Struct Dyn. 2016 Oct;34(10):2069-83. doi: 10.1080/07391102.2015.1106978. Epub 2015 Nov 27.

Predicting protein-ligand binding site using support vector machine with protein properties.基于蛋白质特性的支持向量机预测蛋白质-配体结合位点

IEEE/ACM Trans Comput Biol Bioinform. 2013 Nov-Dec;10(6):1517-29. doi: 10.1109/TCBB.2013.126.

引用本文的文献

Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention.基于混合特征和自注意力的模型集成进行蛋白质-蛋白质相互作用位点预测。

BMC Bioinformatics. 2023 Dec 5;24(1):456. doi: 10.1186/s12859-023-05592-7.

Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System.基于随机投影集成系统的蛋白质序列全局预测蛋白质热点。

Int J Mol Sci. 2017 Jul 18;18(7):1543. doi: 10.3390/ijms18071543.

Progress and challenges in predicting protein interfaces.预测蛋白质界面的进展与挑战。

Brief Bioinform. 2016 Jan;17(1):117-31. doi: 10.1093/bib/bbv027. Epub 2015 May 13.

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone.LigandRFs：一种随机森林集成算法，可仅通过序列信息识别配体结合残基。

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S4. doi: 10.1186/1471-2105-15-S15-S4. Epub 2014 Dec 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

检测异常残基以改善蛋白质杂合体界面预测。

Detection of outlier residues for improving interface prediction in protein heterocomplexes.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献