van der Ploeg G R, Westerhuis J A, Heintz-Buschart A, Smilde A K
Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, the Netherlands.
mSystems. 2025 Jun 17;10(6):e0047225. doi: 10.1128/msystems.00472-25. Epub 2025 May 21.
Studies investigating microbial temporal dynamics are increasingly common, leveraging longitudinal designs that collect microbial abundance data across multiple time points from the same subjects. Traditional exploratory approaches like principal component analysis fail to fully utilize this structure. By organizing data as a three-way array-subjects as rows, microbial abundances as columns, and time points as the third dimension-multi-way methods such as parallel factor analysis (PARAFAC) can better capture temporal and structural patterns. This study demonstrates PARAFAC as a method to explore longitudinal microbiome data using three exemplary studies. In the first example, a long-time series of microbiomes, PARAFAC identifies primary time-resolved variations. The second example, a longitudinal infant gut microbiome study, shows that PARAFAC can distinguish subject groups and enhance comparative analysis, even with moderate missing data. In the third example, a gingivitis intervention study of the oral microbiome, PARAFAC enables the identification of microbial subcommunities of interest through post-hoc clustering. These examples highlight PARAFAC's broad applicability for analyzing longitudinal microbiome data across diverse environments. The approach is implemented in the R package parafac4microbiome, available on the Comprehensive R Archive Network (CRAN), providing researchers with accessible tools for similar analyses.IMPORTANCEUnderstanding how microbiomes change over time can give us valuable insights into their role in health and disease. Many traditional methods like principal component analysis miss important patterns in data collected over time, but parallel factor analysis (PARAFAC) helps uncover these trends in a much clearer way. Using this approach, we were able to identify key changes in microbiomes across different settings, like lab experiments, the infant gut, and the mouth. PARAFAC also works well even when some data is missing, which is a common issue. To make this tool accessible, we have included it in a user-friendly R package, enabling other researchers to analyze microbiome dynamics in their own studies and explore how these changes might influence health and treatments.
研究微生物时间动态的研究越来越普遍,这些研究利用纵向设计从同一受试者的多个时间点收集微生物丰度数据。像主成分分析这样的传统探索性方法无法充分利用这种结构。通过将数据组织为一个三维数组——以受试者为行,微生物丰度为列,时间点为第三维——诸如平行因子分析(PARAFAC)这样的多向方法可以更好地捕捉时间和结构模式。本研究通过三项示例性研究展示了PARAFAC作为一种探索纵向微生物组数据的方法。在第一个例子中,一个长期的微生物组时间序列,PARAFAC识别出主要的时间分辨变化。第二个例子,一项纵向婴儿肠道微生物组研究表明,即使存在中度缺失数据,PARAFAC也能区分受试者组并加强比较分析。在第三个例子中,一项口腔微生物组的牙龈炎干预研究中,PARAFAC通过事后聚类能够识别出感兴趣的微生物亚群落。这些例子突出了PARAFAC在分析不同环境下纵向微生物组数据方面的广泛适用性。该方法在R包parafac4microbiome中实现,可在综合R存档网络(CRAN)上获取,为研究人员提供了用于类似分析的便捷工具。
重要性
了解微生物组如何随时间变化可以让我们深入了解它们在健康和疾病中的作用。许多传统方法,如主成分分析,会错过随时间收集的数据中的重要模式,但平行因子分析(PARAFAC)有助于以更清晰的方式揭示这些趋势。使用这种方法,我们能够识别不同环境下微生物组的关键变化,如实验室实验、婴儿肠道和口腔。即使存在一些缺失数据,PARAFAC也能很好地发挥作用,而缺失数据是一个常见问题。为了使这个工具易于使用,我们将其包含在一个用户友好的R包中,使其他研究人员能够在自己的研究中分析微生物组动态,并探索这些变化可能如何影响健康和治疗。