Santra Tapesh, Delatola Eleni Ioanna
Systems Biology Ireland, University College Dublin, Belfield, Dublin-4, Ireland.
Sci Rep. 2016 Jul 22;6:30159. doi: 10.1038/srep30159.
Presence of considerable noise and missing data points make analysis of mass-spectrometry (MS) based proteomic data a challenging task. The missing values in MS data are caused by the inability of MS machines to reliably detect proteins whose abundances fall below the detection limit. We developed a Bayesian algorithm that exploits this knowledge and uses missing data points as a complementary source of information to the observed protein intensities in order to find differentially expressed proteins by analysing MS based proteomic data. We compared its accuracy with many other methods using several simulated datasets. It consistently outperformed other methods. We then used it to analyse proteomic screens of a breast cancer (BC) patient cohort. It revealed large differences between the proteomic landscapes of triple negative and Luminal A, which are the most and least aggressive types of BC. Unexpectedly, majority of these differences could be attributed to the direct transcriptional activity of only seven transcription factors some of which are known to be inactive in triple negative BC. We also identified two new proteins which significantly correlated with the survival of BC patients, and therefore may have potential diagnostic/prognostic values.
大量噪声和缺失数据点的存在使得基于质谱(MS)的蛋白质组学数据分析成为一项具有挑战性的任务。MS数据中的缺失值是由MS机器无法可靠检测丰度低于检测限的蛋白质所致。我们开发了一种贝叶斯算法,该算法利用这一知识,并将缺失数据点用作观测到的蛋白质强度的补充信息源,以便通过分析基于MS的蛋白质组学数据来寻找差异表达的蛋白质。我们使用几个模拟数据集将其准确性与许多其他方法进行了比较。它始终优于其他方法。然后我们用它来分析一组乳腺癌(BC)患者的蛋白质组学筛查结果。结果显示,三阴型和管腔A型这两种侵袭性最强和最弱的BC类型之间的蛋白质组图谱存在很大差异。出乎意料的是,这些差异中的大多数可归因于仅7种转录因子的直接转录活性,其中一些转录因子已知在三阴型BC中无活性。我们还鉴定出两种与BC患者生存率显著相关的新蛋白质,因此可能具有潜在的诊断/预后价值。