Goldfarb Dennis, Hast Bridgid E, Wang Wei, Major Michael B
Department of Computer Science, University of North Carolina at Chapel Hill , Box #3175, Chapel Hill, North Carolina 27599, United States.
J Proteome Res. 2014 Dec 5;13(12):5944-55. doi: 10.1021/pr5008416. Epub 2014 Oct 20.
Protein-protein interactions defined by affinity purification and mass spectrometry (APMS) suffer from high false discovery rates. Consequently, lists of potential interactions must be pruned of contaminants before network construction and interpretation, historically an expensive, time-intensive, and error-prone task. In recent years, numerous computational methods were developed to identify genuine interactions from the hundreds of candidates. Here, comparative analysis of three popular algorithms, HGSCore, CompPASS, and SAINT, revealed complementarity in their classification accuracies, which is supported by their divergent scoring strategies. We improved each algorithm by an average area under a receiver operating characteristics curve increase of 16% by integrating a variety of indirect data known to correlate with established protein-protein interactions, including mRNA coexpression, gene ontologies, domain-domain binding affinities, and homologous protein interactions. Each APMS scoring approach was incorporated into a separate logistic regression model along with the indirect features; the resulting three classifiers demonstrate improved performance on five diverse APMS data sets. To facilitate APMS data scoring within the scientific community, we created Spotlite, a user-friendly and fast web application. Within Spotlite, data can be scored with the augmented classifiers, annotated, and visualized ( http://cancer.unc.edu/majorlab/software.php ). The utility of the Spotlite platform to reveal physical, functional, and disease-relevant characteristics within APMS data is established through a focused analysis of the KEAP1 E3 ubiquitin ligase.
通过亲和纯化和质谱分析(APMS)确定的蛋白质-蛋白质相互作用存在较高的错误发现率。因此,在构建和解释网络之前,必须从潜在相互作用列表中剔除污染物,这在历史上是一项昂贵、耗时且容易出错的任务。近年来,人们开发了许多计算方法来从数百个候选物中识别真正的相互作用。在这里,对三种流行算法HGSCore、CompPASS和SAINT的比较分析揭示了它们在分类准确性上的互补性,这得到了它们不同评分策略的支持。我们通过整合各种已知与已建立的蛋白质-蛋白质相互作用相关的间接数据,包括mRNA共表达、基因本体、结构域-结构域结合亲和力和同源蛋白质相互作用,将每种算法的受试者操作特征曲线下的平均面积提高了16%,从而改进了每种算法。每种APMS评分方法都与间接特征一起纳入一个单独的逻辑回归模型;由此产生的三个分类器在五个不同的APMS数据集上表现出了更好的性能。为了便于科学界对APMS数据进行评分,我们创建了Spotlite,这是一个用户友好且快速的网络应用程序。在Spotlite中,可以使用增强的分类器对数据进行评分、注释和可视化(http://cancer.unc.edu/majorlab/software.php)。通过对KEAP1 E3泛素连接酶的重点分析,确立了Spotlite平台在揭示APMS数据中的物理、功能和疾病相关特征方面的实用性。