Dang Tung, Fuji Yushiro, Kumaishi Kie, Usui Erika, Kobori Shungo, Sato Takumi, Narukawa Megumi, Toda Yusuke, Sakurai Kengo, Yamasaki Yuji, Tsujimoto Hisashi, Hirai Masami Yokota, Ichihashi Yasunori, Iwata Hiroyoshi
Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, 4F, Faculty of Science Building 3, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
Graduate School of Agricultural and Life Sciences, Building 1 #327, Department of Agriculture, The University of Tokyo, 1-1-1, Yayoi, Bunkyo, Tokyo 113-8657, Japan.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf132.
High-dimensional multi-omics microbiome data play an important role in elucidating microbial community interactions with their hosts and environment in critical diseases and ecological changes. Although Bayesian clustering methods have recently been used for the integrated analysis of multi-omics data, no method designed to analyze multi-omics microbiome data has been proposed. In this study, we propose a novel framework called integrative stochastic variational variable selection (I-SVVS), which is an extension of stochastic variational variable selection for high-dimensional microbiome data. The I-SVVS approach addresses a specific Bayesian mixture model for each type of omics data, such as an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data. This approach is expected to reduce the computational time of the clustering process and improve the accuracy of the clustering results. Additionally, I-SVVS identifies a critical set of representative variables in multi-omics microbiome data. Three datasets from soybean, mice, and humans (each set integrated microbiome and metabolome) were used to demonstrate the potential of I-SVVS. The results indicate that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets. It effectively identified key microbiome species and metabolites characterizing each cluster. For instance, the computational analysis of the soybean dataset, including 377 samples with 16 943 microbiome species and 265 metabolome features, was completed in 2.18 hours using I-SVVS, compared to 2.35 days with Clusternomics and 1.12 days with iClusterPlus. The software for this analysis, written in Python, is freely available at https://github.com/tungtokyo1108/I-SVVS.
高维多组学微生物组数据在阐明关键疾病和生态变化中微生物群落与其宿主及环境之间的相互作用方面发挥着重要作用。尽管贝叶斯聚类方法最近已用于多组学数据的综合分析,但尚未提出专门用于分析多组学微生物组数据的方法。在本研究中,我们提出了一种名为整合随机变分变量选择(I-SVVS)的新框架,它是针对高维微生物组数据的随机变分变量选择的扩展。I-SVVS方法针对每种组学数据类型处理一个特定的贝叶斯混合模型,例如针对微生物组数据的无限狄利克雷多项混合模型和针对代谢组学数据的无限高斯混合模型。这种方法有望减少聚类过程的计算时间并提高聚类结果的准确性。此外,I-SVVS可识别多组学微生物组数据中的一组关键代表性变量。使用来自大豆、小鼠和人类的三个数据集(每个数据集整合了微生物组和代谢组)来证明I-SVVS的潜力。结果表明,与所有测试数据集中的现有方法相比,I-SVVS实现了更高的准确性和更快的计算速度。它有效地识别了表征每个聚类的关键微生物物种和代谢物。例如,使用I-SVVS对包含377个样本、16943个微生物物种和265个代谢组特征的大豆数据集进行计算分析,耗时2.18小时,而使用Clusternomics耗时2.35天,使用iClusterPlus耗时1.12天。此分析用Python编写的软件可在https://github.com/tungtokyo1108/I-SVVS上免费获取。