Goh Wilson Wen Bin, Wong Limsoon
School of Pharmaceutical Science and Technology, Tianjin University , 92 Weijin Road, Nankai District, Tianjin 300072, China.
Department of Computer Science, National University of Singapore , 13 Computing Drive, Singapore 117417.
J Proteome Res. 2016 Sep 2;15(9):3167-79. doi: 10.1021/acs.jproteome.6b00402. Epub 2016 Aug 10.
Despite advances in proteomic technologies, idiosyncratic data issues, for example, incomplete coverage and inconsistency, resulting in large data holes, persist. Moreover, because of naïve reliance on statistical testing and its accompanying p values, differential protein signatures identified from such proteomics data have little diagnostic power. Thus, deploying conventional analytics on proteomics data is insufficient for identifying novel drug targets or precise yet sensitive biomarkers. Complex-based analysis is a new analytical approach that has potential to resolve these issues but requires formalization. We categorize complex-based analysis into five method classes or paradigms and propose an even-handed yet comprehensive evaluation rubric based on both simulated and real data. The first four paradigms are well represented in the literature. The fifth and newest paradigm, the network-paired (NP) paradigm, represented by a method called Extremely Small SubNET (ESSNET), dominates in precision-recall and reproducibility, maintains strong performance in small sample sizes, and sensitively detects low-abundance complexes. In contrast, the commonly used over-representation analysis (ORA) and direct-group (DG) test paradigms maintain good overall precision but have severe reproducibility issues. The other two paradigms considered here are the hit-rate and rank-based network analysis paradigms; both of these have good precision-recall and reproducibility, but they do not consider low-abundance complexes. Therefore, given its strong performance, NP/ESSNET may prove to be a useful approach for improving the analytical resolution of proteomics data. Additionally, given its stability, it may also be a powerful new approach toward functional enrichment tests, much like its ORA and DG counterparts.
尽管蛋白质组学技术取得了进展,但特异性数据问题仍然存在,例如覆盖不完整和不一致,导致大量数据漏洞。此外,由于单纯依赖统计检验及其伴随的p值,从此类蛋白质组学数据中识别出的差异蛋白质特征几乎没有诊断能力。因此,对蛋白质组学数据进行传统分析不足以识别新的药物靶点或精确而敏感的生物标志物。基于复合物的分析是一种新的分析方法,有潜力解决这些问题,但需要形式化。我们将基于复合物的分析分为五类方法或范式,并基于模拟数据和真实数据提出一个公正而全面的评估标准。前四种范式在文献中有很好的体现。第五种也是最新的范式,即网络配对(NP)范式,由一种名为极小子网络(ESSNET)的方法代表,在精确召回率和可重复性方面占主导地位,在小样本量中保持强大性能,并能灵敏地检测低丰度复合物。相比之下,常用的过度表达分析(ORA)和直接分组(DG)测试范式保持了良好的总体精度,但存在严重的可重复性问题。这里考虑的另外两种范式是命中率和基于排名的网络分析范式;这两种范式都有良好的精确召回率和可重复性,但它们不考虑低丰度复合物。因此,鉴于其强大的性能,NP/ESSNET可能被证明是一种提高蛋白质组学数据分析分辨率的有用方法。此外,鉴于其稳定性,它也可能成为功能富集测试的一种强大新方法,就像其ORA和DG对应方法一样。