Suppr超能文献

基于 Apache Spark 的混合机器学习预测慢性肾脏病。

Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark.

机构信息

Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt.

Faculty of Informatics and Computer Science, British University, Egypt, Cairo, Egypt.

出版信息

Comput Intell Neurosci. 2022 Feb 23;2022:9898831. doi: 10.1155/2022/9898831. eCollection 2022.

Abstract

Chronic kidney disease (CKD) has become a widespread disease among people. It is related to various serious risks like cardiovascular disease, heightened risk, and end-stage renal disease, which can be feasibly avoidable by early detection and treatment of people in danger of this disease. The machine learning algorithm is a source of significant assistance for medical scientists to diagnose the disease accurately in its outset stage. Recently, Big Data platforms are integrated with machine learning algorithms to add value to healthcare. Therefore, this paper proposes hybrid machine learning techniques that include feature selection methods and machine learning classification algorithms based on big data platforms (Apache Spark) that were used to detect chronic kidney disease (CKD). The feature selection techniques, namely, Relief-F and chi-squared feature selection method, were applied to select the important features. Six machine learning classification algorithms were used in this research: decision tree (DT), logistic regression (LR), Naive Bayes (NB), Random Forest (RF), support vector machine (SVM), and Gradient-Boosted Trees (GBT Classifier) as ensemble learning algorithms. Four methods of evaluation, namely, accuracy, precision, recall, and F1-measure, were applied to validate the results. For each algorithm, the results of cross-validation and the testing results have been computed based on full features, the features selected by Relief-F, and the features selected by chi-squared feature selection method. The results showed that SVM, DT, and GBT Classifiers with the selected features had achieved the best performance at 100% accuracy. Overall, Relief-F's selected features are better than full features and the features selected by chi-square.

摘要

慢性肾脏病(CKD)已成为一种广泛存在于人群中的疾病。它与各种严重的风险相关,如心血管疾病、风险增加和终末期肾病,通过对处于疾病危险中的人群进行早期检测和治疗,这些风险是可以切实避免的。机器学习算法是医学科学家在疾病早期准确诊断疾病的重要辅助手段。最近,大数据平台与机器学习算法相结合,为医疗保健增加了价值。因此,本文提出了基于大数据平台(Apache Spark)的混合机器学习技术,其中包括特征选择方法和机器学习分类算法,用于检测慢性肾脏病(CKD)。特征选择技术,如 Relief-F 和卡方特征选择方法,用于选择重要特征。本研究使用了六种机器学习分类算法:决策树(DT)、逻辑回归(LR)、朴素贝叶斯(NB)、随机森林(RF)、支持向量机(SVM)和梯度提升树(GBT 分类器)作为集成学习算法。应用了四种评估方法,即准确率、精度、召回率和 F1 度量,以验证结果。对于每种算法,基于全特征、Relief-F 选择的特征和卡方特征选择方法选择的特征,计算了交叉验证和测试结果。结果表明,在 100%准确率方面,SVM、DT 和 GBT 分类器与所选特征的性能最佳。总体而言,Relief-F 选择的特征优于全特征和卡方选择的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c79/8890824/4241de59cea1/CIN2022-9898831.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验