Suppr超能文献

基于约束的模型的实用抽样:优化的稀疏化提高了 CHRR 的性能。

Practical sampling of constraint-based models: Optimized thinning boosts CHRR performance.

机构信息

Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich, Jülich, Germany.

Computational Systems Biotechnology (AVT.CSB), RWTH Aachen University, Aachen, Germany.

出版信息

PLoS Comput Biol. 2023 Aug 11;19(8):e1011378. doi: 10.1371/journal.pcbi.1011378. eCollection 2023 Aug.

Abstract

Thinning is a sub-sampling technique to reduce the memory footprint of Markov chain Monte Carlo. Despite being commonly used, thinning is rarely considered efficient. For sampling constraint-based models, a highly relevant use-case in systems biology, we here demonstrate that thinning boosts computational and, thereby, sampling efficiencies of the widely used Coordinate Hit-and-Run with Rounding (CHRR) algorithm. By benchmarking CHRR with thinning with simplices and genome-scale metabolic networks of up to thousands of dimensions, we find a substantial increase in computational efficiency compared to unthinned CHRR, in our examples by orders of magnitude, as measured by the effective sample size per time (ESS/t), with performance gains growing with polytope (effective network) dimension. Using a set of benchmark models we derive a ready-to-apply guideline for tuning thinning to efficient and effective use of compute resources without requiring additional coding effort. Our guideline is validated using three (out-of-sample) large-scale networks and we show that it allows sampling convex polytopes uniformly to convergence in a fraction of time, thereby unlocking the rigorous investigation of hitherto intractable models. The derivation of our guideline is explained in detail, allowing future researchers to update it as needed as new model classes and more training data becomes available. CHRR with deliberate utilization of thinning thereby paves the way to keep pace with progressing model sizes derived with the constraint-based reconstruction and analysis (COBRA) tool set. Sampling and evaluation pipelines are available at https://jugit.fz-juelich.de/IBG-1/ModSim/fluxomics/chrrt.

摘要

稀疏化是一种用于减少马尔可夫链蒙特卡罗内存占用的子采样技术。尽管稀疏化被广泛使用,但它很少被认为是高效的。对于采样基于约束的模型,这是系统生物学中一个非常相关的用例,我们在这里证明,对于广泛使用的坐标命中和运行加舍入(CHRR)算法,稀疏化可以提高计算效率,从而提高采样效率。通过使用稀疏化和单纯形以及基因组规模的代谢网络(多达数千个维度)对 CHRR 进行基准测试,与未经稀疏化的 CHRR 相比,我们在计算效率方面有了显著提高,在我们的示例中,以有效样本量(ESS/t)来衡量,其性能增益随多面体(有效网络)维度的增加而增加。使用一组基准模型,我们得出了一个可用于调整稀疏化以有效利用计算资源的指南,而无需额外的编码工作。我们的指南使用三个(超出样本)大规模网络进行了验证,并表明它可以在一小部分时间内均匀地对凸多面体进行采样,从而可以对迄今为止难以处理的模型进行严格的研究。我们的指南的推导过程解释得很详细,允许未来的研究人员根据新的模型类和更多的训练数据的可用性,在需要时对其进行更新。通过有意利用稀疏化,CHRR 为使用基于约束的重建和分析(COBRA)工具集推导的不断增长的模型大小保持同步采样提供了途径。采样和评估管道可在 https://jugit.fz-juelich.de/IBG-1/ModSim/fluxomics/chrrt 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6a9/10446239/0818b610ab22/pcbi.1011378.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验