Weighill Deborah, Guebila Marouen Ben, Lopes-Ramos Camila, Glass Kimberly, Quackenbush John, Platig John, Burkholz Rebekka
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
Channing Division of Network Medicine, Brigham and Women's Hospital.
Proc AAAI Conf Artif Intell. 2021 Feb;35(11):10263-10272. Epub 2021 May 18.
Bipartite network inference is a ubiquitous problem across disciplines. One important example in the field molecular biology is gene regulatory network inference. Gene regulatory networks are an instrumental tool aiding in the discovery of the molecular mechanisms driving diverse diseases, including cancer. However, only noisy observations of the projections of these regulatory networks are typically assayed. In an effort to better estimate regulatory networks from their noisy projections, we formulate a non-convex but analytically tractable optimization problem called OTTER. This problem can be interpreted as relaxed graph matching between the two projections of the bipartite network. OTTER's solutions can be derived explicitly and inspire a spectral algorithm, for which we provide network recovery guarantees. We also provide an alternative approach based on gradient descent that is more robust to noise compared to the spectral algorithm. Interestingly, this gradient descent approach resembles the message passing equations of an established gene regulatory network inference method, PANDA. Using three cancer-related data sets, we show that OTTER outperforms state-of-the-art inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph matching applications to this problem, we have made all networks and validation data publicly available.
二分网络推理是一个跨学科的普遍问题。分子生物学领域的一个重要例子是基因调控网络推理。基因调控网络是一种有助于发现包括癌症在内的多种疾病驱动分子机制的工具。然而,通常只能检测到这些调控网络投影的有噪声观测值。为了从有噪声的投影中更好地估计调控网络,我们提出了一个非凸但解析可处理的优化问题,称为OTTER。这个问题可以解释为二分网络两个投影之间的松弛图匹配。OTTER的解可以显式推导出来,并启发了一种谱算法,我们为该算法提供了网络恢复保证。我们还提供了一种基于梯度下降的替代方法,与谱算法相比,该方法对噪声更具鲁棒性。有趣的是,这种梯度下降方法类似于一种成熟的基因调控网络推理方法PANDA的消息传递方程。使用三个与癌症相关的数据集,我们表明OTTER在预测转录因子与基因调控区域的结合方面优于现有最先进的推理方法。为了鼓励针对这个问题的新图匹配应用,我们已将所有网络和验证数据公开。