基于事件自相似性的不平衡超声心动图数据自动特征选择
Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.
作者信息
Huang Huang-Nan, Chen Hong-Min, Lin Wei-Wen, Wiryasaputra Rita, Chen Yung-Cheng, Wang Yu-Huei, Yang Chao-Tung
机构信息
Department of Smart Computing and Applied Mathematics, Tunghai University, Taichung 407224, Taiwan.
Cardiovascular Center, Taichung Veterans General Hospital, Taichung 407219, Taiwan.
出版信息
Diagnostics (Basel). 2025 Apr 11;15(8):976. doi: 10.3390/diagnostics15080976.
Using echocardiogram data for cardiovascular disease (CVD) can lead to difficulties due to imbalanced datasets, leading to biased predictions. Machine learning models can enhance prognosis accuracy, but their effectiveness is influenced by optimal feature selection and robust classification techniques. This study introduces an event-based self-similarity approach to enhance automatic feature selection approach for imbalanced echocardiogram data. Critical features correlated with disease progression were identified by leveraging self-similarity patterns. This study used an echocardiogram dataset, visual presentations of high-frequency sound wave signals, and data of patients with heart disease who are treated using three treatment methods: catheter ablation, ventricular defibrillator, and drug control-over the course of three years. The dataset was classified into nine categories and Recursive Feature Elimination (RFE) was applied to identify the most relevant features, reducing model complexity while maintaining diagnostic accuracy. Machine learning classification models, including XGBoost and CATBoost, were trained and evaluated. Both models achieved comparable accuracy values, 84.3% and 88.4%, respectively, under different normalization techniques. To further optimize performance, the models were combined into a voting ensemble, improving feature selection and predictive accuracy. Four essential features-age, aorta (AO), left ventricular (LV), and left atrium (LA)-were identified as critical for prognosis and were found in Random Forest (RF)-voting ensemble classifier. The results underscore the importance of feature selection techniques in handling imbalanced datasets, improving classification robustness, and reducing bias in automated prognosis systems. Our findings highlight the potential of machine learning-driven echocardiogram analysis to enhance patient care by providing accurate, data-driven assessments.
由于数据集不平衡,使用超声心动图数据诊断心血管疾病(CVD)可能会遇到困难,从而导致预测偏差。机器学习模型可以提高预后准确性,但其有效性受最佳特征选择和强大分类技术的影响。本研究引入了一种基于事件的自相似性方法,以增强对不平衡超声心动图数据的自动特征选择方法。通过利用自相似性模式识别与疾病进展相关的关键特征。本研究使用了一个超声心动图数据集、高频声波信号的可视化呈现,以及采用三种治疗方法(导管消融、植入式心脏复律除颤器和药物控制)治疗三年的心脏病患者数据。该数据集被分为九类,并应用递归特征消除(RFE)来识别最相关的特征,在保持诊断准确性的同时降低模型复杂性。对包括XGBoost和CATBoost在内的机器学习分类模型进行了训练和评估。在不同的归一化技术下,这两种模型分别达到了84.3%和88.4%的可比准确率。为了进一步优化性能,将这些模型组合成一个投票集成模型,提高了特征选择和预测准确性。四个重要特征——年龄、主动脉(AO)、左心室(LV)和左心房(LA)——被确定为预后的关键因素,并在随机森林(RF)投票集成分类器中被发现。结果强调了特征选择技术在处理不平衡数据集、提高分类稳健性以及减少自动预后系统偏差方面的重要性。我们的研究结果突出了机器学习驱动的超声心动图分析通过提供准确的数据驱动评估来改善患者护理的潜力。