Suppr超能文献

一种从聚合生物基因表达数据中学习跳跃扩散过程的数据驱动方法。

A data-driven method to learn a jump diffusion process from aggregate biological gene expression data.

作者信息

Gao Jia-Xing, Wang Zhen-Yi, Zhang Michael Q, Qian Min-Ping, Jiang Da-Quan

机构信息

LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, China.

MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University, Beijing 100084, China.

出版信息

J Theor Biol. 2022 Jan 7;532:110923. doi: 10.1016/j.jtbi.2021.110923. Epub 2021 Oct 1.

Abstract

Dynamic models of gene expression are urgently required. In this paper, we describe the time evolution of gene expression by learning a jump diffusion process to model the biological process directly. Our algorithm needs aggregate gene expression data as input and outputs the parameters of the jump diffusion process. The learned jump diffusion process can predict population distributions of gene expression at any developmental stage, obtain long-time trajectories for individual cells, and offer a novel approach to computing RNA velocity. Moreover, it studies biological systems from a stochastic dynamic perspective. Gene expression data at a time point, which is a snapshot of a cellular process, is treated as an empirical marginal distribution of a stochastic process. The Wasserstein distance between the empirical distribution and predicted distribution by the jump diffusion process is minimized to learn the dynamics. For the learned jump diffusion process, its trajectories correspond to the development process of cells, the stochasticity determines the heterogeneity of cells, its instantaneous rate of state change can be taken as "RNA velocity", and the changes in scales and orientations of clusters can be noticed too. We demonstrate that our method can recover the underlying nonlinear dynamics better compared to previous parametric models and the diffusion processes driven by Brownian motion for both synthetic and real world datasets. Our method is also robust to perturbations of data because the computation involves only population expectations.

摘要

迫切需要基因表达的动态模型。在本文中,我们通过学习一个跳跃扩散过程来直接对生物过程进行建模,从而描述基因表达的时间演化。我们的算法需要聚合基因表达数据作为输入,并输出跳跃扩散过程的参数。所学习到的跳跃扩散过程可以预测任何发育阶段基因表达的群体分布,获得单个细胞的长期轨迹,并提供一种计算RNA速度的新方法。此外,它从随机动态的角度研究生物系统。某一时刻的基因表达数据,即细胞过程的一个快照,被视为一个随机过程的经验边际分布。通过最小化经验分布与跳跃扩散过程预测分布之间的Wasserstein距离来学习动力学。对于所学习到的跳跃扩散过程,其轨迹对应于细胞的发育过程,随机性决定了细胞的异质性,其状态的瞬时变化率可被视为“RNA速度”,并且还可以注意到聚类的尺度和方向的变化。我们证明,与之前的参数模型以及由布朗运动驱动的扩散过程相比,对于合成数据集和真实世界数据集,我们的方法都能更好地恢复潜在的非线性动力学。我们的方法对数据扰动也具有鲁棒性,因为计算仅涉及群体期望。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验