KeySDL：用于关键微生物识别的稀疏字典学习

KeySDL: Sparse Dictionary Learning for Keystone Microbe Identification.

作者信息

Gordon Max, Akyol Turgut Yigit, Amos B, Andersen Stig U, Williams Cranos

机构信息

Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Drive, Raleigh, 27607, North Carolina, USA.

Department of Molecular Biology and Genetics, Aarhus University, Universitetsbyen 81, Aarhus C, 8000, Denmark.

出版信息

bioRxiv. 2025 Aug 11:2025.08.07.669165. doi: 10.1101/2025.08.07.669165.

DOI:10.1101/2025.08.07.669165

PMID:40832167

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12363843/

Abstract

Identification of microbes with large impacts on their microbial communities, known as keystone microbes, is a topic of long-standing interest in microbiome research. However, many approaches to identify keystone microbes are limited by the inherent nonlinearity and state-dependence of microbial dynamics. Machine learning approaches have been applied to address these shortcomings but often require more data than is available for a given microbial system. We propose a keystone identification approach called KeySDL which reduces the amount of data required by incorporating assumptions about the type of microbial dynamics present in the experimental system. The data are modeled as originating from a Generalized Lotka-Volterra (GLV) model, an architecture commonly used to simulate microbial systems. The parameters of this model are then estimated using Sparse Dictionary Learning (SDL) Compared to existing methods, this approach allows accurate prediction of keystone microbes from small numbers of samples and provides an output interpretable as reconstructed system dynamics. We also propose a self-consistency score to help evaluate whether the assumption of GLV dynamics is reasonable for a given dataset, either through the application of KeySDL or other analysis tools validated using GLV simulation.

摘要

识别对其微生物群落有重大影响的微生物，即所谓的关键微生物，是微生物组研究中长期以来备受关注的一个话题。然而，许多识别关键微生物的方法受到微生物动力学固有的非线性和状态依赖性的限制。机器学习方法已被应用于解决这些缺点，但通常需要比给定微生物系统可用数据更多的数据。我们提出了一种名为KeySDL的关键微生物识别方法，该方法通过纳入关于实验系统中存在的微生物动力学类型的假设，减少了所需的数据量。数据被建模为源自广义Lotka-Volterra（GLV）模型，这是一种常用于模拟微生物系统的架构。然后使用稀疏字典学习（SDL）估计该模型的参数。与现有方法相比，这种方法允许从少量样本中准确预测关键微生物，并提供可解释为重建系统动力学的输出。我们还提出了一个自一致性分数，以帮助评估GLV动力学假设对于给定数据集是否合理，无论是通过应用KeySDL还是使用GLV模拟验证的其他分析工具。