Gupta Jaiprakash, Koprinska Irena, Patrick Jon
School of Information Technologies, University of Sydney, Australia.
Stud Health Technol Inform. 2015;214:87-93.
We consider the task of automatic classification of clinical incident reports using machine learning methods. Our data consists of 5448 clinical incident reports collected from the Incident Information Management System used by 7 hospitals in the state of New South Wales in Australia. We evaluate the performance of four classification algorithms: decision tree, naïve Bayes, multinomial naïve Bayes and support vector machine. We initially consider 13 classes (incident types) that were then reduced to 12, and show that it is possible to build accurate classifiers. The most accurate classifier was the multinomial naïve Bayes achieving accuracy of 80.44% and AUC of 0.91. We also investigate the effect of class labelling by an ordinary clinician and an expert, and show that when the data is labelled by an expert the classification performance of all classifiers improves. We found that again the best classifier was multinomial naïve Bayes achieving accuracy of 81.32% and AUC of 0.97. Our results show that some classes in the Incident Information Management System such as Primary Care are not distinct and their removal can improve performance; some other classes such as Aggression Victim are easier to classify than others such as Behavior and Human Performance. In summary, we show that the classification performance can be improved by expert class labelling of the training data, removing classes that are not well defined and selecting appropriate machine learning classifiers.
我们考虑使用机器学习方法对临床事件报告进行自动分类的任务。我们的数据由从澳大利亚新南威尔士州7家医院使用的事件信息管理系统收集的5448份临床事件报告组成。我们评估了四种分类算法的性能:决策树、朴素贝叶斯、多项式朴素贝叶斯和支持向量机。我们最初考虑了13个类别(事件类型),然后减少到12个,并表明可以构建准确的分类器。最准确的分类器是多项式朴素贝叶斯,准确率达到80.44%,AUC为0.91。我们还研究了普通临床医生和专家进行类别标注的效果,结果表明,当数据由专家标注时,所有分类器的分类性能都会提高。我们发现,最好的分类器仍然是多项式朴素贝叶斯,准确率达到81.32%,AUC为0.97。我们的结果表明,事件信息管理系统中的一些类别,如初级护理,并不清晰,去除这些类别可以提高性能;其他一些类别,如攻击受害者,比行为和人员绩效等其他类别更容易分类。总之,我们表明,通过对训练数据进行专家类别标注、去除定义不明确的类别以及选择合适的机器学习分类器,可以提高分类性能。