Claussen Erin R, Woodcock-Girard Miles D, Fischer Samantha N, Drew Kevin
bioRxiv. 2025 Jul 28:2025.07.17.665435. doi: 10.1101/2025.07.17.665435.
Cellular function is driven by the activity proteins in stable complexes. Protein complex assembly depends on the direct physical association of component proteins. Advances in macromolecular structure prediction with tools like AlphaFold and RoseTTAFold have greatly improved our ability to model these interactions but an all-by-all analysis of the human proteome's ∼200M possible pairs remains computationally intractable. A comprehensive cellular map of direct protein interactions will therefore be an invaluable resource to direct screening efforts. Here, we present , a machine learning model that distinguishes direct from indirect protein interactions using features derived from over 25,000 mass spectrometry experiments. Applied to ∼26 million human protein pairs, our model outperforms previous resources in identifying direct physical interactions and enriches for accurate structural models including ∼2,500 new AlphaFold3 models. Our framework enables structural modeling of disease-relevant complexes (e.g. orofacial digital syndrome (OFDS) complex) offering insights into the molecular consequences of pathogenic mutations (OFD1) and broadly, establishes a highly accurate protein wiring diagram of the cell.
细胞功能由稳定复合物中的蛋白质活性驱动。蛋白质复合物的组装取决于组成蛋白质之间的直接物理结合。使用AlphaFold和RoseTTAFold等工具进行的大分子结构预测取得了进展,极大地提高了我们对这些相互作用进行建模的能力,但对人类蛋白质组约2亿可能的蛋白对进行全对全分析在计算上仍然难以处理。因此,一份全面的直接蛋白质相互作用细胞图谱将是指导筛选工作的宝贵资源。在这里,我们展示了一种机器学习模型,该模型使用来自25000多个质谱实验的特征来区分直接和间接蛋白质相互作用。应用于约2600万个人类蛋白质对,我们的模型在识别直接物理相互作用方面优于以前的资源,并丰富了准确的结构模型,包括约2500个新的AlphaFold3模型。我们的框架能够对与疾病相关的复合物(如口面指综合征(OFDS)复合物)进行结构建模,深入了解致病突变(OFD1)的分子后果,并广泛地建立了细胞的高精度蛋白质连接图。