Suppr超能文献

MicroHDF:基于深度森林框架利用宏基因组数据预测宿主表型。

MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework.

机构信息

College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China.

Guangxi Key Laboratory of Embedded Technology and Intelligent Systems, Guilin University of Technology, Guilin, Gaungxi 541004, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae530.

Abstract

The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.

摘要

肠道微生物群在人类健康中起着至关重要的作用,人们已经做出了巨大的努力,希望通过机器学习 (ML) 方法,将微生物群作为有前途的指标或预测因子来预测人类表型,特别是疾病。然而,在用宏基因组数据预测宿主表型时,准确性会受到很多因素的影响,例如样本量小、类别不平衡、高维特征等。为了解决这些挑战,我们提出了 MicroHDF,这是一种可解释的深度学习框架,用于预测宿主表型,其中设计了一个深森林单元的级联层来处理样本类别不平衡和高维特征。实验结果表明,在 6 种不同疾病的 13 个公开可用数据集上,MicroHDF 的性能与现有最先进方法的性能相当。特别是,它在炎症性肠病 (IBD) 和肝硬化的接收者操作特征曲线下面积分别为 0.9182 ± 0.0098 和 0.9469 ± 0.0076 时表现最佳。我们的 MicroHDF 在跨研究验证中也表现出更好的性能和稳健性。此外,MicroHDF 还应用于 IBD 和自闭症谱系障碍这两种高风险疾病作为案例研究,以识别潜在的生物标志物。总之,我们的方法为宿主表型的预测提供了一种有效且可靠的方法,并发现了具有生物学见解的信息特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b523/11500453/8687fd40f41c/bbae530f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验