Chowdhury Salim A, Koyutürk Mehmet
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA.
Pac Symp Biocomput. 2010:133-44. doi: 10.1142/9789814295291_0016.
In the study of complex phenotypes, single gene markers can only provide limited insights into the manifestation of phenotype. To this end, protein-protein interaction (PPI) networks prove useful in the identification of multiple interacting markers. Recent studies show that, when considered together, many proteins that are connected via physical and functional interactions exhibit significant differential expression with respect to various complex phenotypes, including cancers. As compared to single gene markers, these "coordinately dysregulated subnetworks" improve diagnosis and prognosis of cancer significantly and offer novel insights into the network dynamics of phenotype. However, the problem of identifying coordinately dysregulated subnetworks presents significant algorithmic challenges. Existing approaches utilize heuristics that aim to greedily maximize information-theoretic class separability measures, however, by definition of "coordinate" dysregulation, such greedy algorithms do not suit well to this problem. In this paper, we formulate coordinate dysregulation in the context of the well-known set-cover problem, with a view to capturing the coordination between multiple genes at a sample-specific resolution. Based on this formulation, we adapt state-of-the-art approximation algorithms for set-cover to the identification of coordinately dysregulated subnetworks. Comprehensive experimental results on human colorectal cancer (CRC) show that, when compared to existing algorithms, the proposed algorithm, NETCOVER, improves diagnosis of cancer and prediction of metastasis significantly. Our results also demonstrate that subnetworks in the neighborhood of known CRC driver genes exhibit significant coordinate dysregulation, indicating that the notion of coordinate dysregulation may indeed be useful in understanding the network dynamics of complex phenotypes.
在复杂表型的研究中,单基因标记只能为表型的表现提供有限的见解。为此,蛋白质-蛋白质相互作用(PPI)网络在识别多个相互作用标记方面被证明是有用的。最近的研究表明,当综合考虑时,许多通过物理和功能相互作用连接的蛋白质在包括癌症在内的各种复杂表型方面表现出显著的差异表达。与单基因标记相比,这些“协同失调子网”显著改善了癌症的诊断和预后,并为表型的网络动态提供了新的见解。然而,识别协同失调子网的问题带来了重大的算法挑战。现有方法利用启发式算法,旨在贪婪地最大化信息理论类可分离性度量,然而,根据“协同”失调的定义,这种贪婪算法不太适合这个问题。在本文中,我们在著名的集合覆盖问题的背景下制定协同失调,以期在样本特定分辨率下捕捉多个基因之间的协同作用。基于此公式,我们将用于集合覆盖的最先进近似算法应用于识别协同失调子网。对人类结直肠癌(CRC)的综合实验结果表明,与现有算法相比,所提出的算法NETCOVER显著改善了癌症诊断和转移预测。我们的结果还表明,已知CRC驱动基因附近的子网表现出显著的协同失调,这表明协同失调的概念在理解复杂表型的网络动态方面可能确实有用。