Goh Wilson Wen Bin, Lee Yie Hou, Ramdzan Zubaidah M, Chung Maxey C M, Wong Limsoon, Sergot Marek J
Department of Computing, Imperial College London, UK.
Int J Bioinform Res Appl. 2012;8(3-4):155-70. doi: 10.1504/IJBRA.2012.048967.
Hepatocellular Carcinoma (HCC) ranks among the deadliest of cancers and has a complex etiology. Proteomics analysis using iTRAQ provides a direct way to analyse perturbations in protein expression during HCC progression from early- to late-stage but suffers from consistency and coverage issues. Appropriate use of network-based analytical methods can help to overcome these issues. We built an integrated and comprehensive Protein-Protein Interaction Network (PPIN) by merging several major databases. Additionally, the network was filtered for GO coherent edges. Significantly differential genes (seeds) were selected from iTRAQ data and mapped onto this network. Undetected proteins linked to seeds (linked proteins) were identified and functionally characterised. The process of network cleaning provides a list of higher quality linked proteins, which are highly enriched for similar biological process gene ontology terms. Linked proteins are also enriched for known cancer genes and are linked to many well-established cancer processes such as apoptosis and immune response. We found that there is an increased propensity for known cancer genes to be found in highly linked proteins. Three highly-linked proteins were identified that may play an important role in driving HCC progression - the G-protein coupled receptor signalling proteins, ARRB1/2 and the structural protein beta-actin, ACTB. Interestingly, both ARRB proteins evaded detection in the iTRAQ screen. ACTB was not detected in the original dataset derived from Mascot but was found to be strongly supported when we re-ran analysis using another protein detection database (Paragon). Identification of linked proteins helps to partially overcome the coverage issue in shotgun proteomics analysis. The set of linked proteins are found to be enriched for cancer-specific processes, and more likely so if they are more highly linked. Additionally, a higher quality linked set is derived if network-cleaning is performed prior. This form of network-based analysis complements the cluster-based approach, and can provide a larger list of proteins on which to perform functional analysis, as well as for biomarker identification.
肝细胞癌(HCC)是最致命的癌症之一,其病因复杂。使用iTRAQ进行蛋白质组学分析为分析HCC从早期到晚期进展过程中蛋白质表达的扰动提供了直接方法,但存在一致性和覆盖范围问题。合理使用基于网络的分析方法有助于克服这些问题。我们通过合并几个主要数据库构建了一个综合全面的蛋白质-蛋白质相互作用网络(PPIN)。此外,对该网络进行了GO连贯边的筛选。从iTRAQ数据中选择显著差异基因(种子)并映射到该网络上。鉴定与种子相关的未检测到的蛋白质(关联蛋白质)并对其进行功能表征。网络清理过程提供了一份质量更高的关联蛋白质列表,这些蛋白质在相似的生物学过程基因本体术语中高度富集。关联蛋白质也富集了已知的癌症基因,并与许多成熟的癌症过程如细胞凋亡和免疫反应相关。我们发现已知癌症基因在高度关联蛋白质中被发现的倾向增加。鉴定出三种高度关联的蛋白质,它们可能在推动HCC进展中起重要作用——G蛋白偶联受体信号蛋白ARRB1/2和结构蛋白β-肌动蛋白ACTB。有趣的是,在iTRAQ筛选中这两种ARRB蛋白均未被检测到。ACTB在源自Mascot的原始数据集中未被检测到,但当我们使用另一个蛋白质检测数据库(Paragon)重新运行分析时发现其得到了有力支持。关联蛋白质的鉴定有助于部分克服鸟枪法蛋白质组学分析中的覆盖范围问题。发现关联蛋白质组在癌症特异性过程中富集,如果它们的关联度更高则更有可能如此。此外,如果事先进行网络清理,则可以得到更高质量且关联的蛋白质组。这种基于网络的分析形式补充了基于聚类的方法,并且可以提供更多可进行功能分析以及生物标志物鉴定的蛋白质列表。