• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类的先天性心脏病预测集成方法。

A cluster-based ensemble approach for congenital heart disease prediction.

机构信息

Sri Guru Tegh Bahadur Khalsa College, University of Delhi, Delhi, India.

Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India.

出版信息

Comput Methods Programs Biomed. 2024 Jan;243:107922. doi: 10.1016/j.cmpb.2023.107922. Epub 2023 Nov 7.

DOI:10.1016/j.cmpb.2023.107922
PMID:37984098
Abstract

BACKGROUND

One of the most prevalent birth disorders is congenital heart diseases (CHD). Although CHD risk factors have been the subject of numerous studies, their propensity to cause CHD has not been tested. Particularly few research has attempted to forecast CHD risk using population-based cross-sectional data, which is inherently imbalanced.

OBJECTIVE

The main goals of this study are to create a reliable data analysis model that can help with (i) a better understanding of congenital heart disease prediction in the presence of missing and unbalanced data and (ii) creating cohorts of expectant mothers with similar lifestyle characteristics.

METHODS

Clusters of patient cohorts are produced using the unsupervised data mining technique density-based spatial clustering of applications with noise (DBSCAN). For more accurate CHD prediction, a random forest model was trained using these clusters and their corresponding patterns. This study uses a dataset of 33,831 expectant mothers to make its prediction. Missing data were handled using the k-NN imputation approach, while extremely unbalanced data were balanced using SMOTE. These techniques are all data-driven and need little to no user or expert involvement.

RESULTS AND CONCLUSION

Using DBSCAN, three cohorts were found. The cluster information enhanced the random forest-based CHD prediction and revealed intricate factors that influence prediction accuracy. The proposed approach gave the highest results with 99 % accuracy and 0.91 AUC and performed better than the state-of-the-art methodologies. Hence, the suggested method using unsupervised learning can provide intricate information to the classifier and further enhance the performance of the classification.

摘要

背景

先天性心脏病(CHD)是最常见的出生缺陷之一。尽管已有大量研究探讨了 CHD 的危险因素,但这些因素导致 CHD 的倾向尚未得到验证。特别是很少有研究试图使用基于人群的横断面数据来预测 CHD 风险,而这种数据本质上是不平衡的。

目的

本研究的主要目的是创建一个可靠的数据分析模型,以帮助(i)更好地理解存在缺失和不平衡数据时的先天性心脏病预测,以及(ii)创建具有相似生活方式特征的孕妇队列。

方法

使用无监督数据挖掘技术基于密度的空间聚类应用程序的噪声(DBSCAN)生成患者队列的簇。为了更准确地预测 CHD,使用这些簇及其对应的模式训练随机森林模型。本研究使用了 33831 名孕妇的数据集进行预测。使用 k-NN 插补方法处理缺失数据,而使用 SMOTE 平衡极度不平衡的数据。这些技术都是数据驱动的,几乎不需要用户或专家的参与。

结果和结论

使用 DBSCAN 发现了三个队列。簇信息增强了基于随机森林的 CHD 预测,并揭示了影响预测准确性的复杂因素。所提出的方法在 99%的准确率和 0.91 AUC 下取得了最高的结果,并且比最先进的方法表现更好。因此,使用无监督学习的建议方法可以为分类器提供复杂的信息,并进一步提高分类的性能。

相似文献

1
A cluster-based ensemble approach for congenital heart disease prediction.基于聚类的先天性心脏病预测集成方法。
Comput Methods Programs Biomed. 2024 Jan;243:107922. doi: 10.1016/j.cmpb.2023.107922. Epub 2023 Nov 7.
2
Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods.基于数据驱动的宫颈癌预测模型,包含异常值检测和过采样方法。
Sensors (Basel). 2020 May 15;20(10):2809. doi: 10.3390/s20102809.
3
Breast cancer data analysis for survivability studies and prediction.乳腺癌数据分析用于生存研究和预测。
Comput Methods Programs Biomed. 2018 Mar;155:199-208. doi: 10.1016/j.cmpb.2017.12.011. Epub 2017 Dec 12.
4
Predicting congenital heart defects: A comparison of three data mining methods.预测先天性心脏缺陷:三种数据挖掘方法的比较。
PLoS One. 2017 May 24;12(5):e0177811. doi: 10.1371/journal.pone.0177811. eCollection 2017.
5
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
6
Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估
JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.
7
A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes.一种混合的堆叠-SMOTE 模型,用于优化自闭症基因预测。
BMC Bioinformatics. 2023 Oct 6;24(1):379. doi: 10.1186/s12859-023-05501-y.
8
Prediction of coronary heart disease in gout patients using machine learning models.基于机器学习模型预测痛风患者的冠心病风险。
Math Biosci Eng. 2023 Jan;20(3):4574-4591. doi: 10.3934/mbe.2023212. Epub 2022 Dec 27.
9
An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction.一种用于增强中风预测的集成机器学习和数据挖掘方法。
Bioengineering (Basel). 2024 Jul 2;11(7):672. doi: 10.3390/bioengineering11070672.
10
Predicting the postoperative blood coagulation state of children with congenital heart disease by machine learning based on real-world data.基于真实世界数据,通过机器学习预测先天性心脏病患儿术后凝血状态。
Transl Pediatr. 2021 Jan;10(1):33-43. doi: 10.21037/tp-20-238.

引用本文的文献

1
Machine learning and multi-omics integration: advancing cardiovascular translational research and clinical practice.机器学习与多组学整合:推动心血管转化研究与临床实践
J Transl Med. 2025 Apr 2;23(1):388. doi: 10.1186/s12967-025-06425-2.
2
Identifying at-risk patients for congenital heart disease using integrated predictive models and fuzzy clustering analysis: A cross-sectional study.使用综合预测模型和模糊聚类分析识别先天性心脏病高危患者:一项横断面研究。
Heliyon. 2024 Oct 18;10(20):e39609. doi: 10.1016/j.heliyon.2024.e39609. eCollection 2024 Oct 30.
3
Development and Validation of a Machine Learning Algorithm to Predict the Risk of Blood Transfusion after Total Hip Replacement in Patients with Femoral Neck Fractures: A Multicenter Retrospective Cohort Study.
用于预测股骨颈骨折患者全髋关节置换术后输血风险的机器学习算法的开发与验证:一项多中心回顾性队列研究
Orthop Surg. 2024 Aug;16(8):2066-2080. doi: 10.1111/os.14160. Epub 2024 Jul 1.