Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, U.S.A.
Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, New York, 10032 U.S.A.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad158.
Studies have found that human microbiome is associated with and predictive of human health and diseases. Many statistical methods developed for microbiome data focus on different distance metrics that can capture various information in microbiomes. Prediction models were also developed for microbiome data, including deep learning methods with convolutional neural networks that consider both taxa abundance profiles and taxonomic relationships among microbial taxa from a phylogenetic tree. Studies have also suggested that a health outcome could associate with multiple forms of microbiome profiles. In addition to the abundance of some taxa that are associated with a health outcome, the presence/absence of some taxa is also associated with and predictive of the same health outcome. Moreover, associated taxa may be close to each other on a phylogenetic tree or spread apart on a phylogenetic tree. No prediction models currently exist that use multiple forms of microbiome-outcome associations. To address this, we propose a multi-kernel machine regression (MKMR) method that is able to capture various types of microbiome signals when doing predictions. MKMR utilizes multiple forms of microbiome signals through multiple kernels being transformed from multiple distance metrics for microbiomes and learn an optimal conic combination of these kernels, with kernel weights helping us understand contributions of individual microbiome signal types. Simulation studies suggest a much-improved prediction performance over competing methods with mixture of microbiome signals. Real data applicants to predict multiple health outcomes using throat and gut microbiome data also suggest a better prediction of MKMR than that of competing methods.
研究发现,人类微生物组与人类健康和疾病有关,并可对其进行预测。许多针对微生物组数据开发的统计方法都侧重于不同的距离度量标准,这些标准可以捕捉微生物组中的各种信息。也为微生物组数据开发了预测模型,包括使用卷积神经网络的深度学习方法,该方法同时考虑了从系统发育树中得出的微生物分类群的丰度分布和分类关系。研究还表明,健康结果可能与多种微生物组分布有关。除了与健康结果相关的某些分类群的丰度之外,某些分类群的存在/缺失也与相同的健康结果有关并具有预测性。此外,相关的分类群可能在系统发育树上彼此靠近,也可能在系统发育树上散布开来。目前尚无使用多种微生物组-结果关联的预测模型。为了解决这个问题,我们提出了一种多核机器回归(MKMR)方法,该方法在进行预测时能够捕获各种类型的微生物组信号。MKMR 通过从多个微生物组距离度量标准转换的多个核,利用多种微生物组信号,学习这些核的最佳共形组合,核权重可帮助我们了解各个微生物组信号类型的贡献。模拟研究表明,与混合微生物组信号的竞争方法相比,该方法的预测性能有了很大提高。使用喉咙和肠道微生物组数据预测多种健康结果的实际数据应用也表明,MKMR 的预测效果优于竞争方法。