从有偏随机森林和模糊支持向量机中提取规则，用于糖尿病的早期诊断。

Rule extraction from biased random forest and fuzzy support vector machine for early diagnosis of diabetes.

机构信息

Information System and Security and Countermeasures Experiments Center, Beijing Institute of Technology, Beijing, 100081, People's Republic of China.

出版信息

Sci Rep. 2022 Jun 14;12(1):9858. doi: 10.1038/s41598-022-14143-8.

DOI:10.1038/s41598-022-14143-8

PMID:35701587

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9198101/

Abstract

Due to concealed initial symptoms, many diabetic patients are not diagnosed in time, which delays treatment. Machine learning methods have been applied to increase the diagnosis rate, but most of them are black boxes lacking interpretability. Rule extraction is usually used to turn on the black box. As the number of diabetic patients is far less than that of healthy people, the rules obtained by the existing rule extraction methods tend to identify healthy people rather than diabetic patients. To address the problem, a method for extracting reduced rules based on biased random forest and fuzzy support vector machine is proposed. Biased random forest uses the k-nearest neighbor (k-NN) algorithm to identify critical samples and generates more trees that tend to diagnose diabetes based on critical samples to improve the tendency of the generated rules for diabetic patients. In addition, the conditions and rules are reduced based on the error rate and coverage rate to enhance interpretability. Experiments on the Diabetes Medical Examination Data collected by Beijing Hospital (DMED-BH) dataset demonstrate that the proposed approach has outstanding results (MCC = 0.8802) when the rules are similar in number. Moreover, experiments on the Pima Indian Diabetes (PID) and China Health and Nutrition Survey (CHNS) datasets prove the generalization of the proposed method.

摘要

由于隐匿性的初始症状，许多糖尿病患者不能被及时诊断，导致治疗延误。机器学习方法已被应用于提高诊断率，但大多数方法都是缺乏可解释性的“黑箱”。规则提取通常用于打开“黑箱”。由于糖尿病患者的数量远远少于健康人的数量，现有规则提取方法获得的规则往往倾向于识别健康人，而不是糖尿病患者。针对这一问题，提出了一种基于有偏随机森林和模糊支持向量机的简化规则提取方法。有偏随机森林使用 k-最近邻 (k-NN) 算法识别关键样本，并根据关键样本生成更多倾向于诊断糖尿病的树，以提高生成的糖尿病患者规则的倾向性。此外，基于错误率和覆盖率来降低条件和规则，以增强可解释性。基于北京医院采集的糖尿病体检数据（DMED-BH）数据集的实验表明，当规则数量相同时，所提出的方法具有出色的结果（MCC=0.8802）。此外，在 Pima Indian Diabetes（PID）和中国健康与营养调查（CHNS）数据集上的实验证明了所提出方法的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4691/9198101/aa24c488405d/41598_2022_14143_Fig1_HTML.jpg

相似文献

Rule extraction from biased random forest and fuzzy support vector machine for early diagnosis of diabetes.从有偏随机森林和模糊支持向量机中提取规则，用于糖尿病的早期诊断。

Sci Rep. 2022 Jun 14;12(1):9858. doi: 10.1038/s41598-022-14143-8.

Development of a Reinforcement Learning-based Evolutionary Fuzzy Rule-Based System for diabetes diagnosis.基于强化学习的糖尿病诊断进化模糊规则系统的开发。

Comput Biol Med. 2017 Dec 1;91:337-352. doi: 10.1016/j.compbiomed.2017.10.024. Epub 2017 Oct 31.

Clinical Decision Support System for Diabetic Patients by Predicting Type 2 Diabetes Using Machine Learning Algorithms.基于机器学习算法预测 2 型糖尿病的糖尿病患者临床决策支持系统。

J Healthc Eng. 2023 May 30;2023:6992441. doi: 10.1155/2023/6992441. eCollection 2023.

KFPredict: An ensemble learning prediction framework for diabetes based on fusion of key features.KFPredict：一种基于关键特征融合的糖尿病集成学习预测框架。

Comput Methods Programs Biomed. 2023 Apr;231:107378. doi: 10.1016/j.cmpb.2023.107378. Epub 2023 Jan 26.

RFFE - Random Forest Fuzzy Entropy for the classification of Diabetes Mellitus.用于糖尿病分类的随机森林模糊熵（RFFE）

AIMS Public Health. 2023 May 23;10(2):422-442. doi: 10.3934/publichealth.2023030. eCollection 2023.

Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes.基于集成学习方法的支持向量机规则提取：在糖尿病诊断中的应用。

IEEE J Biomed Health Inform. 2015 Mar;19(2):728-34. doi: 10.1109/JBHI.2014.2325615. Epub 2014 May 19.

Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.模糊支持向量机：一种用于微阵列的基于规则的高效分类技术。

BMC Bioinformatics. 2013;14 Suppl 13(Suppl 13):S4. doi: 10.1186/1471-2105-14-S13-S4. Epub 2013 Oct 1.

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective.从数据预处理和机器学习角度看糖尿病的预测与诊断

Comput Methods Programs Biomed. 2022 Jun;220:106773. doi: 10.1016/j.cmpb.2022.106773. Epub 2022 Mar 31.

Early Diabetes Prediction: A Comparative Study Using Machine Learning Techniques.早期糖尿病预测：基于机器学习技术的比较研究。

Stud Health Technol Inform. 2022 Jun 29;295:409-413. doi: 10.3233/SHTI220752.

Implementation of machine learning algorithms to create diabetic patient re-admission profiles.运用机器学习算法创建糖尿病患者再入院档案。

BMC Med Inform Decis Mak. 2019 Dec 12;19(Suppl 9):253. doi: 10.1186/s12911-019-0990-x.

引用本文的文献

A hybrid fuzzy logic-Random Forest model to predict psychiatric treatment order outcomes: an interpretable tool for legal decision support.一种用于预测精神科治疗顺序结果的混合模糊逻辑-随机森林模型：一种用于法律决策支持的可解释工具。

Front Artif Intell. 2025 Jun 17;8:1606250. doi: 10.3389/frai.2025.1606250. eCollection 2025.

Development of a non-contrast CT-based radiomics nomogram for early prediction of delayed cerebral ischemia in aneurysmal subarachnoid hemorrhage.基于非增强CT的影像组学列线图用于动脉瘤性蛛网膜下腔出血后迟发性脑缺血早期预测的研究

BMC Med Imaging. 2025 May 23;25(1):182. doi: 10.1186/s12880-025-01722-0.

Weighted Bayesian Belief Network for diabetics: a predictive model.用于糖尿病患者的加权贝叶斯信念网络：一种预测模型。

Front Artif Intell. 2024 Apr 11;7:1357121. doi: 10.3389/frai.2024.1357121. eCollection 2024.

Study on the Effect of Straw Mulching on Farmland Soil Water.秸秆覆盖对农田土壤水分的影响研究。

J Environ Public Health. 2022 Sep 29;2022:3101880. doi: 10.1155/2022/3101880. eCollection 2022.

本文引用的文献

Colorimetric and Electrochemical Screening for Early Detection of Diabetes Mellitus and Diabetic Retinopathy-Application of Sensor Arrays and Machine Learning.比色法和电化学筛选用于糖尿病和糖尿病视网膜病变的早期检测-传感器阵列和机器学习的应用。

Sensors (Basel). 2022 Jan 18;22(3):718. doi: 10.3390/s22030718.

Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma.比较各种方法将逻辑回归与遗传算法相结合在肝细胞癌生存预测中的应用。

Comput Biol Med. 2021 Jul;134:104431. doi: 10.1016/j.compbiomed.2021.104431. Epub 2021 May 11.

A review on current advances in machine learning based diabetes prediction.基于机器学习的糖尿病预测研究进展综述。

Prim Care Diabetes. 2021 Jun;15(3):435-443. doi: 10.1016/j.pcd.2021.02.005. Epub 2021 Feb 26.

Skin Complications of Diabetes Mellitus Revealed by Polarized Hyperspectral Imaging and Machine Learning.偏振高光谱成像和机器学习揭示的糖尿病皮肤并发症。

IEEE Trans Med Imaging. 2021 Apr;40(4):1207-1216. doi: 10.1109/TMI.2021.3049591. Epub 2021 Apr 1.

Neural network and support vector machine for the prediction of chronic kidney disease: A comparative study.神经网络和支持向量机在慢性肾脏病预测中的比较研究。

Comput Biol Med. 2019 Jun;109:101-111. doi: 10.1016/j.compbiomed.2019.04.017. Epub 2019 Apr 25.

Biased Random Forest For Dealing With the Class Imbalance Problem.用于处理类别不平衡问题的有偏随机森林

IEEE Trans Neural Netw Learn Syst. 2019 Jul;30(7):2163-2172. doi: 10.1109/TNNLS.2018.2878400. Epub 2018 Nov 20.

Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。

Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.

IEEE J Biomed Health Inform. 2015 Mar;19(2):728-34. doi: 10.1109/JBHI.2014.2325615. Epub 2014 May 19.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从有偏随机森林和模糊支持向量机中提取规则，用于糖尿病的早期诊断。

Rule extraction from biased random forest and fuzzy support vector machine for early diagnosis of diabetes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献