Drew Kevin, Müller Christian L, Bonneau Richard, Marcotte Edward M
Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, United States of America.
Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, United States of America.
PLoS Comput Biol. 2017 Oct 12;13(10):e1005625. doi: 10.1371/journal.pcbi.1005625. eCollection 2017 Oct.
Determining the three dimensional arrangement of proteins in a complex is highly beneficial for uncovering mechanistic function and interpreting genetic variation in coding genes comprising protein complexes. There are several methods for determining co-complex interactions between proteins, among them co-fractionation / mass spectrometry (CF-MS), but it remains difficult to identify directly contacting subunits within a multi-protein complex. Correlation analysis of CF-MS profiles shows promise in detecting protein complexes as a whole but is limited in its ability to infer direct physical contacts among proteins in sub-complexes. To identify direct protein-protein contacts within human protein complexes we learn a sparse conditional dependency graph from approximately 3,000 CF-MS experiments on human cell lines. We show substantial performance gains in estimating direct interactions compared to correlation analysis on a benchmark of large protein complexes with solved three-dimensional structures. We demonstrate the method's value in determining the three dimensional arrangement of proteins by making predictions for complexes without known structure (the exocyst and tRNA multi-synthetase complex) and by establishing evidence for the structural position of a recently discovered component of the core human EKC/KEOPS complex, GON7/C14ORF142, providing a more complete 3D model of the complex. Direct contact prediction provides easily calculable additional structural information for large-scale protein complex mapping studies and should be broadly applicable across organisms as more CF-MS datasets become available.
确定复合物中蛋白质的三维排列对于揭示其机制功能以及解释构成蛋白质复合物的编码基因中的遗传变异非常有益。有几种方法可用于确定蛋白质之间的共复合物相互作用,其中包括共分级分离/质谱分析(CF-MS),但要在多蛋白复合物中直接鉴定相互接触的亚基仍然很困难。CF-MS图谱的相关性分析在整体检测蛋白质复合物方面显示出前景,但其推断亚复合物中蛋白质之间直接物理接触关系的能力有限。为了鉴定人类蛋白质复合物中直接的蛋白质-蛋白质接触,我们从大约3000个人类细胞系的CF-MS实验中学习了一个稀疏条件依赖图。与对具有已解析三维结构的大型蛋白质复合物基准进行相关性分析相比,我们在估计直接相互作用方面表现出显著的性能提升。我们通过对无已知结构的复合物(外排体和tRNA多合成酶复合物)进行预测,并为人类核心EKC/KEOPS复合物最近发现的一个组分GON7/C14ORF142的结构位置提供证据,从而确定蛋白质的三维排列,为该复合物提供了更完整的三维模型,证明了该方法的价值。直接接触预测为大规模蛋白质复合物图谱研究提供了易于计算的额外结构信息,并且随着更多CF-MS数据集的出现,应该在各种生物体中广泛适用。