Suppr超能文献

针对有噪声的生物系统的数据驱动模型发现与模型选择

Data-driven model discovery and model selection for noisy biological systems.

作者信息

Wu Xiaojun, McDermott MeiLu, MacLean Adam L

机构信息

Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America.

出版信息

PLoS Comput Biol. 2025 Jan 21;21(1):e1012762. doi: 10.1371/journal.pcbi.1012762. eCollection 2025 Jan.

Abstract

Biological systems exhibit complex dynamics that differential equations can often adeptly represent. Ordinary differential equation models are widespread; until recently their construction has required extensive prior knowledge of the system. Machine learning methods offer alternative means of model construction: differential equation models can be learnt from data via model discovery using sparse identification of nonlinear dynamics (SINDy). However, SINDy struggles with realistic levels of biological noise and is limited in its ability to incorporate prior knowledge of the system. We propose a data-driven framework for model discovery and model selection using hybrid dynamical systems: partial models containing missing terms. Neural networks are used to approximate the unknown dynamics of a system, enabling the denoising of the data while simultaneously learning the latent dynamics. Simulations from the fitted neural network are then used to infer models using sparse regression. We show, via model selection, that model discovery using hybrid dynamical systems outperforms alternative approaches. We find it possible to infer models correctly up to high levels of biological noise of different types. We demonstrate the potential to learn models from sparse, noisy data in application to a canonical cell state transition using data derived from single-cell transcriptomics. Overall, this approach provides a practical framework for model discovery in biology in cases where data are noisy and sparse, of particular utility when the underlying biological mechanisms are partially but incompletely known.

摘要

生物系统呈现出复杂的动态特性,常可用微分方程巧妙地表示。常微分方程模型应用广泛;直到最近,其构建仍需要对系统有广泛的先验知识。机器学习方法提供了模型构建的替代手段:微分方程模型可以通过使用非线性动力学的稀疏识别(SINDy)进行模型发现,从数据中学习得到。然而,SINDy在处理现实水平的生物噪声时存在困难,并且在纳入系统先验知识的能力方面也受到限制。我们提出了一个使用混合动态系统进行模型发现和模型选择的数据驱动框架:包含缺失项的部分模型。神经网络用于逼近系统的未知动态,在学习潜在动态的同时实现数据去噪。然后,利用拟合神经网络的模拟结果通过稀疏回归来推断模型。通过模型选择,我们表明使用混合动态系统进行模型发现优于其他方法。我们发现,在不同类型的高水平生物噪声情况下,都有可能正确推断模型。我们利用来自单细胞转录组学的数据,证明了从稀疏、有噪声的数据中学习模型以应用于典型细胞状态转变的潜力。总体而言,这种方法为生物学中数据有噪声且稀疏的情况下的模型发现提供了一个实用框架,当潜在的生物学机制部分已知但不完全清楚时特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1ba/11753677/7e918ce9cae9/pcbi.1012762.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验