Arici M Kaan, Tuncbag Nurcan
Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.
Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, Turkey.
Front Mol Biosci. 2021 Oct 5;8:666705. doi: 10.3389/fmolb.2021.666705. eCollection 2021.
Beyond the list of molecules, there is a necessity to collectively consider multiple sets of omic data and to reconstruct the connections between the molecules. Especially, pathway reconstruction is crucial to understanding disease biology because abnormal cellular signaling may be pathological. The main challenge is how to integrate the data together in an accurate way. In this study, we aim to comparatively analyze the performance of a set of network reconstruction algorithms on multiple reference interactomes. We first explored several human protein interactomes, including PathwayCommons, OmniPath, HIPPIE, iRefWeb, STRING, and ConsensusPathDB. The comparison is based on the coverage of each interactome in terms of cancer driver proteins, structural information of protein interactions, and the bias toward well-studied proteins. We next used these interactomes to evaluate the performance of network reconstruction algorithms including all-pair shortest path, heat diffusion with flux, personalized PageRank with flux, and prize-collecting Steiner forest (PCSF) approaches. Each approach has its own merits and weaknesses. Among them, PCSF had the most balanced performance in terms of precision and recall scores when 28 pathways from NetPath were reconstructed using the listed algorithms. Additionally, the reference interactome affects the performance of the network reconstruction approaches. The coverage and disease- or tissue-specificity of each interactome may vary, which may result in differences in the reconstructed networks.
除了分子列表之外,有必要综合考虑多组组学数据,并重建分子之间的联系。特别是,通路重建对于理解疾病生物学至关重要,因为异常的细胞信号传导可能是病理性的。主要挑战在于如何以准确的方式将数据整合在一起。在本研究中,我们旨在比较一组网络重建算法在多个参考相互作用组上的性能。我们首先探索了几种人类蛋白质相互作用组,包括PathwayCommons、OmniPath、HIPPIE、iRefWeb、STRING和ConsensusPathDB。比较基于每个相互作用组在癌症驱动蛋白覆盖范围、蛋白质相互作用的结构信息以及对研究充分的蛋白质的偏向性方面的表现。接下来,我们使用这些相互作用组来评估网络重建算法的性能,包括所有对最短路径、带通量的热扩散、带通量的个性化PageRank以及收集奖品的斯坦纳森林(PCSF)方法。每种方法都有其优缺点。其中,当使用列出的算法重建NetPath中的28条通路时,PCSF在精确率和召回率得分方面表现最为平衡。此外,参考相互作用组会影响网络重建方法的性能。每个相互作用组的覆盖范围以及疾病或组织特异性可能会有所不同,这可能导致重建网络的差异。