Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, USA.
Biomed Res Int. 2019 Jan 6;2019:3907195. doi: 10.1155/2019/3907195. eCollection 2019.
Proteomics, the large-scale analysis of proteins, is contributing greatly to understanding gene function in the postgenomic era. However, disease protein ranking using shotgun proteomics data has not been fully evaluated. In this study, we prioritized disease-related proteins by integrating the protein-protein interaction (PPI) network and protein differential expression profiles from colon and rectal cancer (CRC) or breast cancer (BC) proteomics. We applied Local Ranking (LR) and Global Ranking (GR) methods in network with three kinds of protein sets as a priori knowledge, which were known disease proteins (KDPs) that were collected from the Online Mendelian Inheritance in Man (OMIM) database, differentially expressed proteins (DEPs), and the collection of KDPs and their direct neighborhood with differential expression (eKDPs). The cross-validations showed that GR method outperformed LR method while using eKDPs as the initial training showed significantly higher accuracy compared to using the other two a priori sets. And then we validated the top ranked proteins using RNAi-based loss-of-function screens in the DepMap database. The results showed that 75% of top 20 proteins in CRC are necessary for tumor survival. In summary, the network-based Global Ranking with protein differential expression can efficiently prioritize cancer-related proteins and discover new candidate cancer genes or proteins.
蛋白质组学是一种大规模分析蛋白质的方法,它在基因组后时代对理解基因功能做出了巨大贡献。然而,利用蛋白质组学数据进行疾病相关蛋白的排序尚未得到充分评估。在这项研究中,我们通过整合来自结肠癌和直肠癌(CRC)或乳腺癌(BC)蛋白质组学的蛋白质-蛋白质相互作用(PPI)网络和蛋白质差异表达谱,对疾病相关蛋白进行了优先级排序。我们应用了局部排序(LR)和全局排序(GR)方法,在网络中使用了三种蛋白质集作为先验知识,即从在线孟德尔遗传数据库(OMIM)中收集的已知疾病蛋白(KDPs)、差异表达蛋白(DEPs)以及 KDPs 及其差异表达的直接邻域(eKDPs)。交叉验证结果表明,GR 方法优于 LR 方法,而使用 eKDPs 作为初始训练集比使用其他两个先验集具有更高的准确性。然后,我们使用 DepMap 数据库中的 RNAi 基于功能丧失筛选对排名靠前的蛋白进行了验证。结果表明,CRC 中前 20 位蛋白中有 75%对肿瘤存活是必需的。总之,基于网络的具有蛋白质差异表达的全局排序可以有效地对癌症相关蛋白进行优先级排序,并发现新的候选癌症基因或蛋白。