使用无限潜在特征模型在高通量蛋白质相互作用筛选中识别蛋白质复合物。

Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model.

作者信息

Chu Wei, Ghahramani Zoubin, Krause Roland, Wild David L

机构信息

Gatsby Computational Neuroscience Unit, University College London, London, WC1N 3AR, UK.

出版信息

Pac Symp Biocomput. 2006:231-42.

PMID:17094242

Abstract

We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/wild/PSBO6/index.html.

摘要

我们提出了一种贝叶斯方法，用于从高通量蛋白质-蛋白质相互作用筛选中识别蛋白质复合物及其组成成分。一种允许单个蛋白质具有多复合物成员身份的无限潜在特征模型，与一个评估两种蛋白质属于同一复合物可能性的图扩散核相结合。然后使用吉布斯采样从相互作用筛选数据中推断出蛋白质复合物目录。该模型的一个优点是它对复合物的数量没有先验限制，并能从数据中自动推断出显著复合物的数量。使用来自酵母RNA加工复合物的亲和纯化/质谱实验数据的验证结果表明，我们的方法能够以生物学上有意义的方式对数据进行划分。可在http://public.kgi.edu/wild/PSBO6/index.html上访问包含更大版本图形的补充网站。