Department of Biological and Environmental Science, Faculty of Mathematics and Science, University of Jyväskylä, Jyväskylä, Finland.
Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland.
PLoS Comput Biol. 2024 Sep 3;20(9):e1011914. doi: 10.1371/journal.pcbi.1011914. eCollection 2024 Sep.
Joint species distribution modelling (JSDM) is a widely used statistical method that analyzes combined patterns of all species in a community, linking empirical data to ecological theory and enhancing community-wide prediction tasks. However, fitting JSDMs to large datasets is often computationally demanding and time-consuming. Recent studies have introduced new statistical and machine learning techniques to provide more scalable fitting algorithms, but extending these to complex JSDM structures that account for spatial dependencies or multi-level sampling designs remains challenging. In this study, we aim to enhance JSDM scalability by leveraging high-performance computing (HPC) resources for an existing fitting method. Our work focuses on the Hmsc R-package, a widely used JSDM framework that supports the integration of various dataset types into a single comprehensive model. We developed a GPU-compatible implementation of its model-fitting algorithm using Python and the TensorFlow library. Despite these changes, our enhanced framework retains the original user interface of the Hmsc R-package. We evaluated the performance of the proposed implementation across various model configurations and dataset sizes. Our results show a significant increase in model fitting speed for most models compared to the baseline Hmsc R-package. For the largest datasets, we achieved speed-ups of over 1000 times, demonstrating the substantial potential of GPU porting for previously CPU-bound JSDM software. This advancement opens promising opportunities for better utilizing the rapidly accumulating new biodiversity data resources for inference and prediction.
联合物种分布模型 (JSDM) 是一种广泛使用的统计方法,用于分析群落中所有物种的综合模式,将经验数据与生态理论联系起来,并增强对整个群落的预测任务。然而,拟合大型数据集的 JSDM 通常需要大量的计算资源和时间。最近的研究引入了新的统计和机器学习技术,以提供更具可扩展性的拟合算法,但将这些算法扩展到考虑空间依赖性或多层次抽样设计的复杂 JSDM 结构仍然具有挑战性。在这项研究中,我们旨在通过利用高性能计算 (HPC) 资源来增强 JSDM 的可扩展性,针对现有的拟合方法。我们的工作重点是 Hmsc R 包,这是一个广泛使用的 JSDM 框架,支持将各种数据集类型集成到单个综合模型中。我们使用 Python 和 TensorFlow 库为其模型拟合算法开发了一个 GPU 兼容的实现。尽管进行了这些更改,但我们增强的框架保留了 Hmsc R 包的原始用户界面。我们针对各种模型配置和数据集大小评估了所提出的实现的性能。我们的结果表明,与基线 Hmsc R 包相比,大多数模型的模型拟合速度都有显著提高。对于最大的数据集,我们实现了超过 1000 倍的加速,这表明 GPU 移植对于以前受 CPU 限制的 JSDM 软件具有巨大的潜力。这一进展为更好地利用快速积累的新生物多样性数据资源进行推断和预测提供了有希望的机会。