Suppr超能文献

法德尔:通过特征增强和离散化实现的集成学习

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization.

作者信息

Hung Chuan-Sheng, Lin Chun-Hung Richard, Chen Shi-Huang, Zheng You-Cheng, Yu Cheng-Han, Hung Cheng-Wei, Huang Ting-Hsin, Tsai Jui-Hsiu

机构信息

Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan.

Artificial Intelligence Research and Promotion Center, National Sun Yat-sen University, Kaohsiung 804, Taiwan.

出版信息

Bioengineering (Basel). 2025 Jul 30;12(8):827. doi: 10.3390/bioengineering12080827.

Abstract

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class samples. However, these methods often introduce distributional bias and noise, potentially leading to model overfitting, reduced predictive performance, increased computational costs, and elevated cybersecurity risks. To overcome these limitations, we propose a novel architecture, FADEL, which integrates feature-type awareness with a supervised discretization strategy. FADEL introduces a unique feature augmentation ensemble framework that preserves the original data distribution by concurrently processing continuous and discretized features. It dynamically routes these feature sets to their most compatible base models, thereby improving minority class recognition without the need for data-level balancing or augmentation techniques. Experimental results demonstrate that FADEL, solely leveraging feature augmentation without any data augmentation, achieves a recall of 90.8% and a G-mean of 94.5% on the internal test set from Kaohsiung Chang Gung Memorial Hospital in Taiwan. On the external validation set from Kaohsiung Medical University Chung-Ho Memorial Hospital, it maintains a recall of 91.9% and a G-mean of 86.7%. These results outperform conventional ensemble methods trained on CTGAN-balanced datasets, confirming the superior stability, computational efficiency, and cross-institutional generalizability of the FADEL architecture. Altogether, FADEL uses feature augmentation to offer a robust and practical solution to extreme class imbalance, outperforming mainstream data augmentation-based approaches.

摘要

近年来,数据增强技术已成为解决机器学习中高度不平衡分类问题的主要方法。诸如合成少数类过采样技术(SMOTE)和条件表格生成对抗网络(CTGAN)等算法已被证明在合成少数类样本方面是有效的。然而,这些方法常常会引入分布偏差和噪声,可能导致模型过度拟合、预测性能下降、计算成本增加以及网络安全风险升高。为了克服这些限制,我们提出了一种新颖的架构FADEL,它将特征类型感知与监督离散化策略相结合。FADEL引入了一个独特的特征增强集成框架,通过同时处理连续和离散特征来保留原始数据分布。它将这些特征集动态路由到最兼容的基础模型,从而在无需数据级平衡或增强技术的情况下提高少数类识别能力。实验结果表明,FADEL仅利用特征增强而无需任何数据增强,在台湾高雄长庚纪念医院的内部测试集上实现了90.8%的召回率和94.5%的G均值。在高雄医学大学中和纪念医院的外部验证集上,它保持了91.9%的召回率和8个6.7%的G均值。这些结果优于在CTGAN平衡数据集上训练的传统集成方法,证实了FADEL架构具有卓越的稳定性、计算效率和跨机构通用性。总体而言,FADEL使用特征增强为极端类不平衡问题提供了一个强大而实用的解决方案,优于主流的基于数据增强的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f321/12383576/4c04a3b45d1c/bioengineering-12-00827-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验