文献检索，用中文搜 PubMed

A Multi-Stage Framework for Kawasaki Disease Prediction Using Clustering-Based Undersampling and Synthetic Data Augmentation: Cross-Institutional Validation with Dual-Center Clinical Data in Taiwan.

作者信息

Huang Heng-Chih, Hung Chuan-Sheng, Lin Chun-Hung Richard, Shie Yi-Zhen, Yu Cheng-Han, Huang Ting-Hsin

机构信息

Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan.

Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung 83301, Taiwan.

出版信息

Bioengineering (Basel). 2025 Jul 7;12(7):742. doi: 10.3390/bioengineering12070742.

Kawasaki disease (KD) is a rare yet potentially life-threatening pediatric vasculitis that, if left undiagnosed or untreated, can result in serious cardiovascular complications. Its heterogeneous clinical presentation poses diagnostic challenges, often failing to meet classical criteria and increasing the risk of oversight. Leveraging routine laboratory tests with AI offers a promising strategy for enhancing early detection. However, due to the extremely low prevalence of KD, conventional models often struggle with severe class imbalance, limiting their ability to achieve both high sensitivity and specificity in practice. To address this issue, we propose a multi-stage AI-based predictive framework that incorporates clustering-based undersampling, data augmentation, and stacking ensemble learning. The model was trained and internally tested on clinical blood and urine test data from Chang Gung Memorial Hospital (CGMH, n = 74,641; 2010-2019), and externally validated using an independent dataset from Kaohsiung Medical University Hospital (KMUH, n = 1582; 2012-2020), thereby supporting cross-institutional generalizability. At a fixed recall rate of 95%, the model achieved a specificity of 97.5% and an F1-score of 53.6% on the CGMH test set, and a specificity of 74.7% with an F1-score of 23.4% on the KMUH validation set. These results underscore the model's ability to maintain high specificity even under sensitivity-focused constraints, while still delivering clinically meaningful predictive performance. This balance of sensitivity and specificity highlights the framework's practical utility for real-world KD screening.

川崎病（KD）是一种罕见但可能危及生命的儿童血管炎，如果未被诊断或治疗，可能会导致严重的心血管并发症。其临床表现具有异质性，给诊断带来了挑战，常常不符合经典标准，增加了漏诊风险。利用人工智能辅助常规实验室检查为提高早期检测提供了一种有前景的策略。然而，由于KD的患病率极低，传统模型常常难以应对严重的类别不平衡问题，限制了它们在实际应用中实现高灵敏度和高特异性的能力。为了解决这个问题，我们提出了一个基于人工智能的多阶段预测框架，该框架结合了基于聚类的欠采样、数据增强和堆叠集成学习。该模型在长庚纪念医院（CGMH，n = 74,641；2010 - 2019年）的临床血液和尿液检测数据上进行了训练和内部测试，并使用高雄医学大学医院（KMUH，n = 1582；2012 - 2020年）的独立数据集进行了外部验证，从而支持跨机构的通用性。在固定召回率为95%的情况下，该模型在CGMH测试集上的特异性达到97.5%，F1分数为53.6%，在KMUH验证集上的特异性为74.7%，F1分数为23.4%。这些结果强调了该模型即使在以灵敏度为重点的约束条件下仍能保持高特异性的能力，同时仍能提供具有临床意义的预测性能。灵敏度和特异性的这种平衡突出了该框架在现实世界中进行KD筛查的实际效用。