Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany.
Institute of Molecular Virology, Ulm University Medical Center, Ulm, Germany.
J Comput Chem. 2019 Apr 30;40(11):1233-1242. doi: 10.1002/jcc.25780. Epub 2019 Feb 15.
The prediction of peptide-protein or protein-protein interactions (PPI) is a challenging task, especially if amino acid sequences are the only information available. Machine learning methods allow us to exploit the information content in PPI datasets. However, the numerical codification of these datasets often influences the performance of data mining approaches. Here, we introduce a procedure for the general-purpose numerical codification of polypeptides. This procedure transforms pairs of amino acid sequences into a machine learning-friendly vector, whose elements represent numerical descriptors of residues in proteins. We used this numerical encoding procedure for the development of a support vector machine model (PPI-Detect), which allows predicting whether two proteins will interact or not. PPI-Detect (https://ppi-detect.zmb.uni-due.de/) outperforms state of the art sequence-based predictors of PPI. We employed PPI-Detect for the analysis of derivatives of EPI-X4, an endogenous peptide inhibitor of CXCR4, a G-protein-coupled receptor. There, we identified with high accuracy those peptides which bind better than EPI-X4 to the receptor. Also using PPI-Detect, we designed a novel peptide and then experimentally established its anti-CXCR4 activity. © 2019 Wiley Periodicals, Inc.
肽-蛋白或蛋白-蛋白相互作用(PPI)的预测是一项具有挑战性的任务,特别是如果仅有的信息是氨基酸序列。机器学习方法使我们能够利用 PPI 数据集的信息内容。然而,这些数据集的数值编码通常会影响数据挖掘方法的性能。在这里,我们介绍了一种通用的多肽数值编码程序。该程序将氨基酸序列对转换为机器学习友好的向量,其元素代表蛋白质中残基的数值描述符。我们使用这种数值编码程序开发了一种支持向量机模型(PPI-Detect),它可以预测两个蛋白质是否会相互作用。PPI-Detect(https://ppi-detect.zmb.uni-due.de/)优于基于序列的 PPI 预测器的最新技术。我们使用 PPI-Detect 分析了 EPI-X4 的衍生物,EPI-X4 是一种内源性 CXCR4 肽抑制剂,是一种 G 蛋白偶联受体。在那里,我们以很高的准确度识别出那些与受体结合比 EPI-X4 更好的肽。同样使用 PPI-Detect,我们设计了一种新型肽,并通过实验证实了其抗 CXCR4 活性。©2019 年 Wiley 期刊,公司