Suppr超能文献

多类提升在微生物组数据多个不完全视图分析中的应用。

Multi-class boosting for the analysis of multiple incomplete views on microbiome data.

机构信息

BioSense Institute, University of Novi Sad, dr Zorana Djindjića 1, Novi Sad, 21000, Serbia.

Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 3, Novi Sad, 21000, Serbia.

出版信息

BMC Bioinformatics. 2024 May 14;25(1):188. doi: 10.1186/s12859-024-05767-w.

Abstract

BACKGROUND

Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views.

RESULTS

We propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease.

CONCLUSIONS

The proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines.

摘要

背景

微生物组失调最近与不同的疾病和障碍有关。在这种情况下,机器学习(ML)方法可以用于识别新的模式或学习预测模型。然而,供 ML 方法使用的数据可能会受到不同的采样、测序和预处理技术的影响。管道中的每个不同选择都可能导致对同一个体的不同视图(即特征集),而经典的(单视图)ML 方法可能无法同时考虑到这些视图。此外,一些视图可能是不完整的,即某些个体在某些视图中可能缺失,可能是由于缺乏某些测量值,或者由于某些特征不适用于所有个体。多视图学习方法可以代表一种可能的解决方案,用于考虑同一个体的多个特征集,但大多数现有的多视图学习方法仅限于二进制分类任务,或者不能处理不完整的视图。

结果

我们提出了 irBoost.SH,这是 rBoost.SH 多视图提升算法的扩展,基于多臂老虎机。irBoost.SH 解决多类分类任务,并可以分析不完整的视图。在每次迭代中,它使用对抗性多臂老虎机识别一个获胜视图,并使用其预测在基于提升的学习过程中更新共享实例权重分布。在我们对 5 个多视图微生物组数据集进行的实验中,irBoost.SH 学习的模型始终优于从单个视图学习的最佳模型、其最接近的竞争对手 rBoost.SH 以及基于特征连接的多视图方法学习的模型,在预测自闭症谱系障碍方面的 F1 评分提高了 11.8%,在预测结直肠癌方面提高了 114%。

结论

与竞争对手方法相比,所提出的方法 irBoost.SH 在我们的实验中表现出色。所得结果证实,irBoost.SH 可以成功地用于微生物组数据的分析,因为它能够同时利用通过不同测序和预处理管道获得的多个特征集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验