Department of Computer Science, Tufts University, Medford, Massachusetts, USA.
J Comput Biol. 2024 Nov;31(11):1087-1103. doi: 10.1089/cmb.2024.0607. Epub 2024 Oct 10.
Understanding gene regulatory networks (GRNs) is crucial for elucidating cellular mechanisms and advancing therapeutic interventions. Original methods for GRN inference from bulk expression data often struggled with the high dimensionality and inherent noise in the data. Here we introduce RegDiffusion, a new class of Denoising Diffusion Probabilistic Models focusing on the regulatory effects among feature variables. RegDiffusion introduces Gaussian noise to the input data following a diffusion schedule and uses a neural network with a parameterized adjacency matrix to predict the added noise. We show that using this process, GRNs can be learned effectively with a surprisingly simple model architecture. In our benchmark experiments, RegDiffusion shows superior performance compared to several baseline methods in multiple datasets. We also demonstrate that RegDiffusion can infer biologically meaningful regulatory networks from real-world single-cell data sets with over 15,000 genes in under 5 minutes. This work not only introduces a fresh perspective on GRN inference but also highlights the promising capacity of diffusion-based models in the area of single-cell analysis. The RegDiffusion software package and experiment data are available at https://github.com/TuftsBCB/RegDiffusion.
理解基因调控网络(GRNs)对于阐明细胞机制和推进治疗干预措施至关重要。从批量表达数据中推断 GRN 的原始方法通常难以处理数据中的高维性和固有噪声。在这里,我们引入了 RegDiffusion,这是一类新的去噪扩散概率模型,专注于特征变量之间的调控效应。RegDiffusion 按照扩散时间表向输入数据添加高斯噪声,并使用具有参数化邻接矩阵的神经网络来预测添加的噪声。我们表明,通过使用该过程,可以使用非常简单的模型架构有效地学习 GRNs。在我们的基准实验中,RegDiffusion 在多个数据集上的表现优于几种基线方法。我们还证明,RegDiffusion 可以从具有超过 15000 个基因的真实单细胞数据集推断出具有生物学意义的调控网络,整个过程在不到 5 分钟内完成。这项工作不仅为 GRN 推断带来了新的视角,还突出了基于扩散的模型在单细胞分析领域的广阔应用前景。RegDiffusion 软件包和实验数据可在 https://github.com/TuftsBCB/RegDiffusion 上获取。