Sitani Divya, Giorgetti Alejandro, Alfonso-Prieto Mercedes, Carloni Paolo
JARA-Institute: Molecular Neuroscience and Neuroimaging, Institute for Neuroscience and Medicine INM-11/JARA-BRAIN Institute JBI-2, Forschungszentrum Jülich GmbH, Jülich, Germany.
Department of Biology, RWTH Aachen University, Aachen, Germany.
Proteins. 2021 Jun;89(6):639-647. doi: 10.1002/prot.26047. Epub 2021 Feb 2.
Proteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein-protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein-protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre-processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method.
蛋白质通常通过与其他细胞伴侣结合来发挥其功能。热点是蛋白质-蛋白质结合的关键残基。它们的识别可能有助于揭示疾病相关突变对蛋白质复合物的影响,并有助于设计用于治疗的蛋白质-蛋白质相互作用抑制剂。不幸的是,当前预测热点的机器学习方法存在数据矩阵中严重错误导致的局限性。在这里,我们提出了一种新颖的数据预处理流程,该流程通过使用稳健主成分分析恢复低秩矩阵并降低噪声来克服此问题。应用于现有数据库显示了该方法的预测能力。