Department of Biomedical Informatics, Emory University, Atlanta, Georgia, USA.
Department of Computer Science, Princeton University, Princeton, New Jersey, USA.
J Comput Biol. 2021 May;28(5):469-484. doi: 10.1089/cmb.2020.0435. Epub 2021 Jan 5.
A classic problem in computational biology is the identification of : subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is , explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.
与其他基因/蛋白质相比,相互作用网络中的基因/蛋白质的差异表达、高度突变或其他异常的子网。已经开发了许多方法来在各种假设下解决这个问题,但这些方法的统计特性通常是未知的。例如,一些广泛使用的方法被报道输出非常大的子网,这些子网在生物学上很难解释。在这项工作中,我们将异常子网的识别问题表述为估计一类概率分布参数的问题,我们称之为异常子集分布 (ASD)。我们推导出一种流行的方法 jActiveModules 与 ASD 的最大似然估计 (MLE) 之间的联系。我们表明,MLE 是 ,解释了 jActiveModules 输出的大子网。基于这些见解,我们引入了 NetMix,这是一种使用高斯混合模型来获得 ASD 参数的偏差更小的估计的算法。我们证明,NetMix 在识别模拟和真实数据中的异常子网方面优于现有方法,包括从微阵列和 RNA-seq 实验中识别差异表达基因以及在体细胞突变数据中识别癌症驱动基因。