针对有噪声的生物系统的数据驱动模型发现与模型选择

Data-driven model discovery and model selection for noisy biological systems.

作者信息

Wu Xiaojun, McDermott MeiLu, MacLean Adam L

机构信息

Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America.

出版信息

PLoS Comput Biol. 2025 Jan 21;21(1):e1012762. doi: 10.1371/journal.pcbi.1012762. eCollection 2025 Jan.

DOI:10.1371/journal.pcbi.1012762

PMID:39836686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11753677/

Abstract

Biological systems exhibit complex dynamics that differential equations can often adeptly represent. Ordinary differential equation models are widespread; until recently their construction has required extensive prior knowledge of the system. Machine learning methods offer alternative means of model construction: differential equation models can be learnt from data via model discovery using sparse identification of nonlinear dynamics (SINDy). However, SINDy struggles with realistic levels of biological noise and is limited in its ability to incorporate prior knowledge of the system. We propose a data-driven framework for model discovery and model selection using hybrid dynamical systems: partial models containing missing terms. Neural networks are used to approximate the unknown dynamics of a system, enabling the denoising of the data while simultaneously learning the latent dynamics. Simulations from the fitted neural network are then used to infer models using sparse regression. We show, via model selection, that model discovery using hybrid dynamical systems outperforms alternative approaches. We find it possible to infer models correctly up to high levels of biological noise of different types. We demonstrate the potential to learn models from sparse, noisy data in application to a canonical cell state transition using data derived from single-cell transcriptomics. Overall, this approach provides a practical framework for model discovery in biology in cases where data are noisy and sparse, of particular utility when the underlying biological mechanisms are partially but incompletely known.

摘要

生物系统呈现出复杂的动态特性，常可用微分方程巧妙地表示。常微分方程模型应用广泛；直到最近，其构建仍需要对系统有广泛的先验知识。机器学习方法提供了模型构建的替代手段：微分方程模型可以通过使用非线性动力学的稀疏识别（SINDy）进行模型发现，从数据中学习得到。然而，SINDy在处理现实水平的生物噪声时存在困难，并且在纳入系统先验知识的能力方面也受到限制。我们提出了一个使用混合动态系统进行模型发现和模型选择的数据驱动框架：包含缺失项的部分模型。神经网络用于逼近系统的未知动态，在学习潜在动态的同时实现数据去噪。然后，利用拟合神经网络的模拟结果通过稀疏回归来推断模型。通过模型选择，我们表明使用混合动态系统进行模型发现优于其他方法。我们发现，在不同类型的高水平生物噪声情况下，都有可能正确推断模型。我们利用来自单细胞转录组学的数据，证明了从稀疏、有噪声的数据中学习模型以应用于典型细胞状态转变的潜力。总体而言，这种方法为生物学中数据有噪声且稀疏的情况下的模型发现提供了一个实用框架，当潜在的生物学机制部分已知但不完全清楚时特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1ba/11753677/7e918ce9cae9/pcbi.1012762.g001.jpg

相似文献

Data-driven model discovery and model selection for noisy biological systems.针对有噪声的生物系统的数据驱动模型发现与模型选择

PLoS Comput Biol. 2025 Jan 21;21(1):e1012762. doi: 10.1371/journal.pcbi.1012762. eCollection 2025 Jan.

Distilling identifiable and interpretable dynamic models from biological data.从生物数据中提取可识别和可解释的动态模型。

PLoS Comput Biol. 2023 Oct 18;19(10):e1011014. doi: 10.1371/journal.pcbi.1011014. eCollection 2023 Oct.

Learning Equations from Biological Data with Limited Time Samples.从有限的时间样本中学习生物数据的方程。

Bull Math Biol. 2020 Sep 9;82(9):119. doi: 10.1007/s11538-020-00794-z.

SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics.SINDy-PI：一种用于非线性动力学并行隐式稀疏识别的稳健算法。

Proc Math Phys Eng Sci. 2020 Oct;476(2242):20200279. doi: 10.1098/rspa.2020.0279. Epub 2020 Oct 7.

Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control.集成动态模式分解法：在低数据、高噪声情况下通过主动学习与控制进行稳健的稀疏模型发现

Proc Math Phys Eng Sci. 2022 Apr;478(2260):20210904. doi: 10.1098/rspa.2021.0904. Epub 2022 Apr 13.

Discovering governing equations of biological systems through representation learning and sparse model discovery.通过表征学习和稀疏模型发现来揭示生物系统的控制方程。

NAR Genom Bioinform. 2025 Apr 26;7(2):lqaf048. doi: 10.1093/nargab/lqaf048. eCollection 2025 Jun.

Gaussian processes meet NeuralODEs: a Bayesian framework for learning the dynamics of partially observed systems from scarce and noisy data.高斯过程与神经 ODE 相遇：从稀缺和嘈杂的数据中学习部分观测系统动态的贝叶斯框架。

Philos Trans A Math Phys Eng Sci. 2022 Aug 8;380(2229):20210201. doi: 10.1098/rsta.2021.0201. Epub 2022 Jun 20.

Sparse identification of nonlinear dynamics for model predictive control in the low-data limit.低数据量情况下用于模型预测控制的非线性动力学的稀疏识别

Proc Math Phys Eng Sci. 2018 Nov;474(2219):20180335. doi: 10.1098/rspa.2018.0335. Epub 2018 Nov 14.

Data-driven discovery of coordinates and governing equations.数据驱动的坐标和控制方程的发现。

Proc Natl Acad Sci U S A. 2019 Nov 5;116(45):22445-22451. doi: 10.1073/pnas.1906995116. Epub 2019 Oct 21.

Physiology-informed regularisation enables training of universal differential equation systems for biological applications.基于生理学的正则化能够训练用于生物学应用的通用微分方程系统。

PLoS Comput Biol. 2025 Jan 23;21(1):e1012198. doi: 10.1371/journal.pcbi.1012198. eCollection 2025 Jan.

引用本文的文献

The Relationship Between Biological Noise and Its Application: Understanding System Failures and Suggesting a Method to Enhance Functionality Based on the Constrained Disorder Principle.生物噪声及其应用之间的关系：理解系统故障并基于受限无序原理提出一种增强功能的方法。

Biology (Basel). 2025 Mar 27;14(4):349. doi: 10.3390/biology14040349.

本文引用的文献

Model discovery approach enables noninvasive measurement of intra-tumoral fluid transport in dynamic MRI.模型发现方法能够在动态磁共振成像中对肿瘤内液体转运进行无创测量。

APL Bioeng. 2024 Apr 29;8(2):026106. doi: 10.1063/5.0190561. eCollection 2024 Jun.

AI-Aristotle: A physics-informed framework for systems biology gray-box identification.人工智能与亚里士多德：一种用于系统生物学灰箱识别的物理信息框架。

PLoS Comput Biol. 2024 Mar 12;20(3):e1011916. doi: 10.1371/journal.pcbi.1011916. eCollection 2024 Mar.

WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION.弱稀疏识别（SINDy）：基于伽辽金法的数据驱动模型选择

Multiscale Model Simul. 2021;19(3):1474-1497. doi: 10.1137/20m1343166. Epub 2021 Sep 7.

State estimation of a physical system with unknown governing equations.具有未知控制方程的物理系统的状态估计。

Nature. 2023 Oct;622(7982):261-267. doi: 10.1038/s41586-023-06574-8. Epub 2023 Oct 11.

From time-series transcriptomics to gene regulatory networks: A review on inference methods.从时间序列转录组学到基因调控网络：推理方法综述。

PLoS Comput Biol. 2023 Aug 10;19(8):e1011254. doi: 10.1371/journal.pcbi.1011254. eCollection 2023 Aug.

Data driven model discovery and interpretation for CAR T-cell killing using sparse identification and latent variables.使用稀疏识别和潜在变量进行 CAR T 细胞杀伤的数据驱动模型发现和解释。

Front Immunol. 2023 May 15;14:1115536. doi: 10.3389/fimmu.2023.1115536. eCollection 2023.

Genomic and microenvironmental heterogeneity shaping epithelial-to-mesenchymal trajectories in cancer.基因组和微环境异质性塑造癌症中的上皮-间充质轨迹。

Nat Commun. 2023 Feb 11;14(1):789. doi: 10.1038/s41467-023-36439-7.

Comparative single-cell transcriptomes of dose and time dependent epithelial-mesenchymal spectrums.剂量和时间依赖性上皮-间质谱系的比较单细胞转录组

NAR Genom Bioinform. 2022 Sep 21;4(3):lqac072. doi: 10.1093/nargab/lqac072. eCollection 2022 Sep.

Fully learnable deep wavelet transform for unsupervised monitoring of high-frequency time series.完全可学习的深度小波变换用于高频时间序列的无监督监测。

Proc Natl Acad Sci U S A. 2022 Feb 22;119(8). doi: 10.1073/pnas.2106598119.

WEAK SINDY FOR PARTIAL DIFFERENTIAL EQUATIONS.偏微分方程的弱辛迪方法

J Comput Phys. 2021 Oct 15;443. doi: 10.1016/j.jcp.2021.110525. Epub 2021 Jun 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

针对有噪声的生物系统的数据驱动模型发现与模型选择

Data-driven model discovery and model selection for noisy biological systems.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献