Cho Baek Hwan, Yu Hwanjo, Kim Kwang-Won, Kim Tae Hyun, Kim In Young, Kim Sun I
Department of Biomedical Engineering, Hanyang University, Seoul, Republic of Korea.
Artif Intell Med. 2008 Jan;42(1):37-53. doi: 10.1016/j.artmed.2007.09.005. Epub 2007 Nov 7.
Diabetic nephropathy is damage to the kidney caused by diabetes mellitus. It is a common complication and a leading cause of death in people with diabetes. However, the decline in kidney function varies considerably between patients and the determinants of diabetic nephropathy have not been clearly identified. Therefore, it is very difficult to predict the onset of diabetic nephropathy accurately with simple statistical approaches such as t-test or chi(2)-test. To accurately predict the onset of diabetic nephropathy, we applied various machine learning techniques to irregular and unbalanced diabetes dataset, such as support vector machine (SVM) classification and feature selection methods. Visualization of the risk factors was another important objective to give physicians intuitive information on each patient's clinical pattern.
We collected medical data from 292 patients with diabetes and performed preprocessing to extract 184 features from the irregular data. To predict the onset of diabetic nephropathy, we compared several classification methods such as logistic regression, SVM, and SVM with a cost sensitive learning method. We also applied several feature selection methods to remove redundant features and improve the classification performance. For risk factor analysis with SVM classifiers, we have developed a new visualization system which uses a nomogram approach.
Linear SVM classifiers combined with wrapper or embedded feature selection methods showed the best results. Among the 184 features, the classifiers selected the same 39 features and gave 0.969 of the area under the curve by receiver operating characteristics analysis. The visualization tool was able to present the effect of each feature on the decision via graphical output.
Our proposed method can predict the onset of diabetic nephropathy about 2-3 months before the actual diagnosis with high prediction performance from an irregular and unbalanced dataset, which statistical methods such as t-test and logistic regression could not achieve. Additionally, the visualization system provides physicians with intuitive information for risk factor analysis. Therefore, physicians can benefit from the automatic early warning of each patient and visualize risk factors, which facilitate planning of effective and proper treatment strategies.
糖尿病肾病是由糖尿病引起的肾脏损害。它是糖尿病患者常见的并发症和主要死因。然而,患者之间的肾功能下降差异很大,糖尿病肾病的决定因素尚未明确确定。因此,使用t检验或卡方检验等简单统计方法很难准确预测糖尿病肾病的发病情况。为了准确预测糖尿病肾病的发病,我们将各种机器学习技术应用于不规则且不平衡的糖尿病数据集,如支持向量机(SVM)分类和特征选择方法。危险因素的可视化是另一个重要目标,以便为医生提供有关每个患者临床模式的直观信息。
我们收集了292例糖尿病患者的医疗数据,并进行预处理以从不规则数据中提取184个特征。为了预测糖尿病肾病的发病,我们比较了几种分类方法,如逻辑回归、支持向量机以及采用成本敏感学习方法的支持向量机。我们还应用了几种特征选择方法来去除冗余特征并提高分类性能。对于使用支持向量机分类器进行危险因素分析,我们开发了一种使用列线图方法的新可视化系统。
结合包装器或嵌入式特征选择方法应用的线性支持向量机分类器显示出最佳结果。在184个特征中,分类器选择了相同的39个特征,通过接受者操作特征分析得出曲线下面积为0.969。可视化工具能够通过图形输出呈现每个特征对决策的影响。
我们提出的方法能够在实际诊断前约2至3个月预测糖尿病肾病的发病,具有较高的预测性能,这是t检验和逻辑回归等统计方法无法实现的,因为数据集不规则且不平衡。此外,可视化系统为医生提供了用于危险因素分析的直观信息。因此,医生可以从对每个患者的自动早期预警中受益,并可视化危险因素,这有助于制定有效且适当的治疗策略。