Suppr超能文献

用于系统高估调整的双阶段优化器应用于生物标志物选择的多目标遗传算法

Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection.

作者信息

Cattelani Luca, Fortino Vittorio

机构信息

School of Medicine, Institute of Biomedicine, University of Eastern Finland, Yliopistonranta 1, PO Box 1627, 70211 Kuopio, Finland.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae674.

Abstract

The selection of biomarker panels in omics data, challenged by numerous molecular features and limited samples, often requires the use of machine learning methods paired with wrapper feature selection techniques, like genetic algorithms. They test various feature sets-potential biomarker solutions-to fine-tune a machine learning model's performance for supervised tasks, such as classifying cancer subtypes. This optimization process is undertaken using validation sets to evaluate and identify the most effective feature combinations. Evaluations have performance estimation error, measurable as discrepancy between validation and test set performance, and when the selection involves many models the best ones are almost certainly overestimated. This issue is also relevant in a multi-objective feature selection process where various characteristics of the biomarker panels are optimized, such as predictive performances and feature set size. Methods have been proposed to reduce the overestimation after a model has already been selected in single-objective problems, but no algorithm existed capable of reducing the overestimation during the optimization, improving model selection, or applied in the more general multi-objective domain. We propose Dual-stage Optimizer for Systematic overestimation Adjustment in Multi-Objective problems (DOSA-MO), a novel multi-objective optimization wrapper algorithm that learns how the original estimation, its variance, and the feature set size of the solutions predict the overestimation. DOSA-MO adjusts the expectation of the performance during the optimization, improving the composition of the solution set. We verify that DOSA-MO improves the performance of a state-of-the-art genetic algorithm on left-out or external sample sets, when predicting cancer subtypes and/or patient overall survival, using three transcriptomics datasets for kidney and breast cancer.

摘要

在组学数据中选择生物标志物面板,受到众多分子特征和有限样本的挑战,通常需要使用机器学习方法并结合包装特征选择技术,如遗传算法。它们测试各种特征集——潜在的生物标志物解决方案——以微调机器学习模型在监督任务中的性能,例如对癌症亚型进行分类。这个优化过程是使用验证集来评估和识别最有效的特征组合。评估存在性能估计误差,可通过验证集和测试集性能之间的差异来衡量,并且当选择涉及许多模型时,最佳模型几乎肯定被高估。这个问题在多目标特征选择过程中也很重要,在该过程中生物标志物面板的各种特征被优化,如预测性能和特征集大小。已经提出了一些方法来减少单目标问题中模型选择后的高估,但不存在能够在优化过程中减少高估、改进模型选择或应用于更一般的多目标领域的算法。我们提出了多目标问题中系统高估调整的双阶段优化器(DOSA-MO),这是一种新颖的多目标优化包装算法,它了解原始估计、其方差以及解决方案的特征集大小如何预测高估。DOSA-MO在优化过程中调整性能期望,改善解集的组成。我们使用三个肾脏和乳腺癌的转录组学数据集验证了,在预测癌症亚型和/或患者总生存期时,DOSA-MO在留出或外部样本集上提高了一种先进遗传算法的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8ee/11684899/da7359c7663f/bbae674f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验