利用数据增强提高川崎病预测模型的泛化能力：台湾两家主要医院患者的交叉验证

Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.

作者信息

Hung Chuan-Sheng, Lin Chun-Hung Richard, Liu Jain-Shing, Chen Shi-Huang, Hung Tsung-Chi, Tsai Chih-Min

机构信息

Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan.

Artificial Intelligence Research and Promotion Center, National Sun Yat-sen University, Kaohsiung, Taiwan.

出版信息

PLoS One. 2024 Dec 31;19(12):e0314995. doi: 10.1371/journal.pone.0314995. eCollection 2024.

DOI:10.1371/journal.pone.0314995

PMID:39739681

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11687671/

Abstract

Kawasaki Disease (KD) is a rare febrile illness affecting infants and young children, potentially leading to coronary artery complications and, in severe cases, mortality if untreated. However, KD is frequently misdiagnosed as a common fever in clinical settings, and the inherent data imbalance further complicates accurate prediction when using traditional machine learning and statistical methods. This paper introduces two advanced approaches to address these challenges, enhancing prediction accuracy and generalizability. The first approach proposes a stacking model termed the Disease Classifier (DC), specifically designed to recognize minority class samples within imbalanced datasets, thereby mitigating the bias commonly observed in traditional models toward the majority class. Secondly, we introduce a combined model, the Disease Classifier with CTGAN (CTGAN-DC), which integrates DC with Conditional Tabular Generative Adversarial Network (CTGAN) technology to improve data balance and predictive performance further. Utilizing CTGAN-based oversampling techniques, this model retains the original data characteristics of KD while expanding data diversity. This effectively balances positive and negative KD samples, significantly reducing model bias toward the majority class and enhancing both predictive accuracy and generalizability. Experimental evaluations indicate substantial performance gains, with the DC and CTGAN-DC models achieving notably higher predictive accuracy than individual machine learning models. Specifically, the DC model achieves sensitivity and specificity rates of 95%, while the CTGAN-DC model achieves 95% sensitivity and 97% specificity, demonstrating superior recognition capability. Furthermore, both models exhibit strong generalizability across diverse KD datasets, particularly the CTGAN-DC model, which surpasses the JAMA model with a 3% increase in sensitivity and a 95% improvement in generalization sensitivity and specificity, effectively resolving the model collapse issue observed in the JAMA model. In sum, the proposed DC and CTGAN-DC architectures demonstrate robust generalizability across multiple KD datasets from various healthcare institutions and significantly outperform other models, including XGBoost. These findings lay a solid foundation for advancing disease prediction in the context of imbalanced medical data.

摘要

川崎病（KD）是一种影响婴幼儿的罕见发热性疾病，如果不治疗，可能会导致冠状动脉并发症，严重时会导致死亡。然而，在临床环境中，KD经常被误诊为普通发热，而且固有的数据不平衡在使用传统机器学习和统计方法时进一步使准确预测变得复杂。本文介绍了两种先进方法来应对这些挑战，提高预测准确性和泛化能力。第一种方法提出了一种称为疾病分类器（DC）的堆叠模型，专门设计用于识别不平衡数据集中的少数类样本，从而减轻传统模型中常见的对多数类的偏差。其次，我们引入了一种组合模型，即带有条件表格生成对抗网络（CTGAN）的疾病分类器（CTGAN-DC），它将DC与条件表格生成对抗网络（CTGAN）技术相结合，以进一步改善数据平衡和预测性能。利用基于CTGAN的过采样技术，该模型在扩展数据多样性的同时保留了KD的原始数据特征。这有效地平衡了KD阳性和阴性样本，显著降低了模型对多数类的偏差，并提高了预测准确性和泛化能力。实验评估表明性能有显著提升，DC和CTGAN-DC模型的预测准确性明显高于单个机器学习模型。具体而言，DC模型的灵敏度和特异度达到95%，而CTGAN-DC模型的灵敏度为95%，特异度为97%，显示出卓越的识别能力。此外，这两种模型在不同的KD数据集上都表现出很强的泛化能力，特别是CTGAN-DC模型，其灵敏度提高了3%，泛化灵敏度和特异度提高了95%，超过了JAMA模型，有效解决了JAMA模型中观察到的模型崩溃问题。总之，所提出的DC和CTGAN-DC架构在来自不同医疗机构的多个KD数据集上表现出强大的泛化能力，并且明显优于其他模型，包括XGBoost。这些发现为在不平衡医疗数据背景下推进疾病预测奠定了坚实基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31f1/11687671/5bdc54d39ac7/pone.0314995.g001.jpg

相似文献

Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.利用数据增强提高川崎病预测模型的泛化能力：台湾两家主要医院患者的交叉验证

PLoS One. 2024 Dec 31;19(12):e0314995. doi: 10.1371/journal.pone.0314995. eCollection 2024.

Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis.机器学习在鉴别川崎病和其他发热性疾病中的准确性：系统评价和荟萃分析。

J Med Internet Res. 2024 Nov 18;26:e57641. doi: 10.2196/57641.

Use of Machine Learning to Differentiate Children With Kawasaki Disease From Other Febrile Children in a Pediatric Emergency Department.应用机器学习鉴别儿科急诊川崎病与其他发热儿童。

JAMA Netw Open. 2023 Apr 3;6(4):e237489. doi: 10.1001/jamanetworkopen.2023.7489.

Machine learning for early diagnosis of Kawasaki disease in acute febrile children: retrospective cross-sectional study in China.机器学习用于急性发热儿童川崎病的早期诊断：中国的回顾性横断面研究

Sci Rep. 2025 Feb 25;15(1):6799. doi: 10.1038/s41598-025-90919-y.

Data Augmentation of a Corrosion Dataset for Defect Growth Prediction of Pipelines Using Conditional Tabular Generative Adversarial Networks.使用条件表格生成对抗网络对管道缺陷增长预测的腐蚀数据集进行数据增强

Materials (Basel). 2024 Mar 1;17(5):1142. doi: 10.3390/ma17051142.

Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation.通过机器学习和数据增强提高急性胰腺炎的死亡率预测。

Comput Biol Med. 2022 Nov;150:106077. doi: 10.1016/j.compbiomed.2022.106077. Epub 2022 Sep 11.

Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.基于效用的统计方法和深度学习模型用于合成数据生成的分析，重点关注相关结构：算法开发与验证

JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.

Multicentre validation of a computer-based tool for differentiation of acute Kawasaki disease from clinically similar febrile illnesses.基于计算机的工具对急性川崎病与临床相似发热疾病的鉴别诊断的多中心验证。

Arch Dis Child. 2020 Aug;105(8):772-777. doi: 10.1136/archdischild-2019-317980. Epub 2020 Mar 5.

A machine-learning algorithm for diagnosis of multisystem inflammatory syndrome in children and Kawasaki disease in the USA: a retrospective model development and validation study.用于美国儿童多系统炎症综合征和川崎病诊断的机器学习算法：回顾性模型开发和验证研究。

Lancet Digit Health. 2022 Oct;4(10):e717-e726. doi: 10.1016/S2589-7500(22)00149-2.

Single center blind testing of a US multi-center validated diagnostic algorithm for Kawasaki disease in Taiwan.台湾单中心对美国多中心验证的川崎病诊断算法的盲法测试。

Front Immunol. 2022 Oct 3;13:1031387. doi: 10.3389/fimmu.2022.1031387. eCollection 2022.

本文引用的文献

Application of artificial intelligence in the diagnosis and treatment of Kawasaki disease.人工智能在川崎病诊断与治疗中的应用。

World J Clin Cases. 2024 Aug 16;12(23):5304-5307. doi: 10.12998/wjcc.v12.i23.5304.

Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.使用可解释机器学习模型从真实世界数据中对川崎病进行智能诊断。

Hellenic J Cardiol. 2025 Jan-Feb;81:38-48. doi: 10.1016/j.hjc.2024.08.003. Epub 2024 Aug 10.

Synthetic data generation methods in healthcare: A review on open-source tools and methods.医疗保健领域的合成数据生成方法：关于开源工具和方法的综述

Comput Struct Biotechnol J. 2024 Jul 9;23:2892-2910. doi: 10.1016/j.csbj.2024.07.005. eCollection 2024 Dec.

ctGAN: combined transformation of gene expression and survival data with generative adversarial network.ctGAN：利用生成对抗网络对基因表达和生存数据进行联合变换。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae325.

Identifying and handling data bias within primary healthcare data using synthetic data generators.使用合成数据生成器识别和处理初级医疗保健数据中的数据偏差。

Heliyon. 2024 Jan 10;10(2):e24164. doi: 10.1016/j.heliyon.2024.e24164. eCollection 2024 Jan 30.

Kawasaki disease, multisystem inflammatory syndrome in children, and adenoviral infection: a scoring system to guide differential diagnosis.川崎病、儿童多系统炎症综合征和腺病毒感染：用于指导鉴别诊断的评分系统。

Eur J Pediatr. 2023 Nov;182(11):4889-4895. doi: 10.1007/s00431-023-05142-6. Epub 2023 Aug 19.

Novel Predictive Scoring System for Intravenous Immunoglobulin Resistance Helps Timely Intervention in Kawasaki Disease: The Chinese Experience.新型川崎病静脉注射免疫球蛋白抵抗预测评分系统有助于及时干预：中国经验。

J Immunol Res. 2023 Aug 9;2023:6808323. doi: 10.1155/2023/6808323. eCollection 2023.

A machine learning model for distinguishing Kawasaki disease from sepsis.用于鉴别川崎病与脓毒症的机器学习模型。

Sci Rep. 2023 Aug 2;13(1):12553. doi: 10.1038/s41598-023-39745-8.

JAMA Netw Open. 2023 Apr 3;6(4):e237489. doi: 10.1001/jamanetworkopen.2023.7489.

Intravenous immunoglobulin resistance in Kawasaki disease patients: prediction using clinical data.川崎病患者静脉注射免疫球蛋白耐药：基于临床数据的预测。

Pediatr Res. 2024 Feb;95(3):692-697. doi: 10.1038/s41390-023-02519-z. Epub 2023 Feb 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用数据增强提高川崎病预测模型的泛化能力：台湾两家主要医院患者的交叉验证

Enhancing generalization in a Kawasaki Disease prediction model using data augmentation: Cross-validation of patients from two major hospitals in Taiwan.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献