两种数据挖掘技术在伊朗药房索赔数据集诊断标签中的比较：人工神经网络（ANN）与决策树模型。

Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.

作者信息

Rezaei-Darzi Ehsan, Farzadfar Farshad, Hashemi-Meshkini Amir, Navidi Iman, Mahmoudi Mahmoud, Varmaghani Mehdi, Mehdipour Parinaz, Soudi Alamdari Mahsa, Tayefi Batool, Naderimagham Shohreh, Soleymani Fatemeh, Mesdaghinia Alireza, Delavari Alireza, Mohammad Kazem

机构信息

1)Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran. 2)Non-communicable Diseases Research Center, Endocrinology and Metabolism Population Science Institute, Tehran University of Medical Sciences, tehran, Iran.

2)Non-communicable Diseases Research Center, Endocrinology and Metabolism Population Science Institute, Tehran University of Medical Sciences, Tehran, Iran. 3)Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.

出版信息

Arch Iran Med. 2014 Dec;17(12):837-43.

DOI:

PMID:25481323

Abstract

BACKGROUND

This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran.

METHODS

This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity.

RESULT

Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%).

CONCLUSION

According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.

摘要

背景

本研究旨在评估和比较两种数据挖掘技术（决策树和神经网络模型）在伊朗胃肠道处方诊断标注中的预测准确性。

方法

本研究分三个阶段进行：数据准备、训练阶段和测试阶段。使用了一个包含2004年至2011年2300万份药房保险理赔记录的数据库样本，其中共评估了330份处方，并同时用于训练和测试模型。在训练阶段，由医生和药剂师分别对所选处方进行评估并给出诊断。为了测试每个模型的性能，除了测量其敏感性和特异性外，还进行了k折分层交叉验证。

结果

总体而言，两种方法的准确率非常相似。考虑真阳性率（敏感性）和真阴性率（特异性）的加权平均值，决策树在正确分类能力方面的准确率略高（分别为83.3%和96%，而神经网络为80.3%和95.1%）。然而，当测量ROC面积（每个类别与所有其他类别之间的AUC）的加权平均值时，人工神经网络在预测诊断方面显示出更高的准确率（93.8%，而决策树为90.6%）。