Suppr超能文献

用于临床结局预测中多类别、多组学数据整合的基准集成机器学习算法

Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.

作者信息

Spooner Annette, Moridani Mohammad Karimi, Toplis Barbra, Behary Jason, Safarchi Azadeh, Maher Salim, Vafaee Fatemeh, Zekry Amany, Sowmya Arcot

机构信息

School of Computer Science and Engineering, University of New South Wales, High St, Kensington, NSW 2052, Australia.

School of Biotechnology and Biomolecular Sciences, University of New South Wales, NSW 2052, Australia.

出版信息

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf116.

Abstract

The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the analysis of multi-modal, multi-omics data presents many challenges. In this work, we compare the performance of a variety of ensemble machine learning (ML) algorithms that are capable of late integration of multi-class data from different modalities. The ensemble methods and their variations tested were (i) a voting ensemble, with hard and soft vote, (ii) a meta learner, and (iii) a multi-modal AdaBoost model using hard vote, soft vote, and meta learner to integrate the modalities on each boosting round, the PB-MVBoost model and a novel application of a mixture of expert's model. These were compared to simple concatenation. We examine these methods using data from an in-house study on hepatocellular carcinoma, plus validation datasets on studies from breast cancer and irritable bowel disease. We develop models that achieve an area under the receiver operating curve of up to 0.85 and find that two boosted methods, PB-MVBoost and AdaBoost with soft vote were the best performing models. We also examine the stability of features selected and the size of the clinical signature. Our work shows that integrating complementary omics and data modalities with effective ensemble ML models enhances accuracy in multi-class clinical outcome predictions and produces more stable predictive features than individual modalities or simple concatenation. We provide recommendations for the integration of multi-modal multi-class data.

摘要

在不同模式的患者数据中发现的互补信息有助于更准确地建模患者的疾病状态,并更好地理解疾病的潜在生物学过程。然而,多模态、多组学数据的分析面临许多挑战。在这项工作中,我们比较了多种能够对来自不同模式的多类数据进行后期整合的集成机器学习(ML)算法的性能。所测试的集成方法及其变体包括:(i)投票集成,包括硬投票和软投票;(ii)元学习器;(iii)多模态AdaBoost模型,在每次提升轮次上使用硬投票、软投票和元学习器来整合模式,PB-MVBoost模型以及专家混合模型的一种新应用。将这些方法与简单拼接进行了比较。我们使用来自一项关于肝细胞癌的内部研究的数据,以及乳腺癌和炎症性肠病研究的验证数据集来检验这些方法。我们开发的模型在受试者工作特征曲线下面积高达0.85,并发现两种提升方法,即PB-MVBoost和软投票的AdaBoost是性能最佳的模型。我们还研究了所选特征的稳定性和临床特征的大小。我们的工作表明,将互补的组学和数据模式与有效的集成ML模型相结合,可提高多类临床结果预测的准确性,并产生比单个模式或简单拼接更稳定的预测特征。我们为多模态多类数据的整合提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32db/11926982/70da4831d227/bbaf116ga1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验