Shen Changyu, Li Lang, Chen Jake Yue
Division of Biostatistics, Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA.
Proteins. 2006 Aug 1;64(2):436-43. doi: 10.1002/prot.20994.
Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proteomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.
收集和处理蛋白质组学数据的实验过程日益复杂,而评估这些数据质量和意义的计算方法仍不够成熟。这些挑战导致了许多生物学上的疏漏和计算上的误解。我们开发了一种经验贝叶斯模型,用于分析源自纯化蛋白质复合物下拉实验的肽质量谱检测的多蛋白复合物(MPC)蛋白质组学数据。使用我们的模型和两个酵母蛋白质组学数据集,我们估计每个MPC平均应有约20个真实关联,几乎是先前估计值的10倍。对于为模拟真实蛋白质组而生成的数据集,我们的模型在检测真实关联时平均实现了80%的灵敏度,而先前工作中的灵敏度为3%,同时保持了相当的0.3%的错误发现率。用各种实验技术确认的蛋白质复合物对我们的结果进行交叉检验表明,我们的方法能够识别出许多先前方法无法识别的真实关联。