Schmid Ernst W, Walter Johannes C
Department of Biological Chemistry & Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.
Howard Hughes Medical Institute, Boston, MA 02115, USA.
bioRxiv. 2024 Apr 12:2024.04.09.588596. doi: 10.1101/2024.04.09.588596.
Protein-protein interactions (PPIs) are ubiquitous in biology, yet a comprehensive structural characterization of the PPIs underlying biochemical processes is lacking. Although AlphaFold-Multimer (AF-M) has the potential to fill this knowledge gap, standard AF-M confidence metrics do not reliably separate relevant PPIs from an abundance of false positive predictions. To address this limitation, we used machine learning on well curated datasets to train a Structure Prediction and Omics informed Classifier called SPOC that shows excellent performance in separating true and false PPIs, including in proteome-wide screens. We applied SPOC to an all-by-all matrix of nearly 300 human genome maintenance proteins, generating ~40,000 predictions that can be viewed at predictomes.org, where users can also score their own predictions with SPOC. High confidence PPIs discovered using our approach suggest novel hypotheses in genome maintenance. Our results provide a framework for interpreting large scale AF-M screens and help lay the foundation for a proteome-wide structural interactome.
蛋白质-蛋白质相互作用(PPI)在生物学中普遍存在,但目前缺乏对生物化学过程中潜在PPI的全面结构表征。尽管AlphaFold-Multimer(AF-M)有潜力填补这一知识空白,但标准的AF-M置信度指标并不能可靠地将相关的PPI与大量的假阳性预测区分开来。为了解决这一局限性,我们在精心策划的数据集上使用机器学习,训练了一个名为SPOC的结构预测和组学信息分类器,该分类器在区分真假PPI方面表现出色,包括在全蛋白质组筛选中。我们将SPOC应用于近300种人类基因组维持蛋白的全对全矩阵,生成了约40000个预测结果,这些结果可以在predictomes.org上查看,用户也可以在该网站上使用SPOC对自己的预测进行评分。使用我们的方法发现的高置信度PPI为基因组维持提出了新的假设。我们的结果为解释大规模AF-M筛选提供了一个框架,并有助于为全蛋白质组结构相互作用组奠定基础。