Centre for Health Informatics, University of New South Wales, Sydney, Australia.
J Am Med Inform Assoc. 2012 Jun;19(e1):e110-8. doi: 10.1136/amiajnl-2011-000562. Epub 2012 Jan 11.
To explore the feasibility of using statistical text classification to automatically detect extreme-risk events in clinical incident reports.
Statistical text classifiers based on Naïve Bayes and Support Vector Machine (SVM) algorithms were trained and tested on clinical incident reports to automatically detect extreme-risk events, defined by incidents that satisfy the criteria of Severity Assessment Code (SAC) level 1. For this purpose, incident reports submitted to the Advanced Incident Management System by public hospitals from one Australian region were used. The classifiers were evaluated on two datasets: (1) a set of reports with diverse incident types (n=120); (2) a set of reports associated with patient misidentification (n=166). Results were assessed using accuracy, precision, recall, F-measure, and area under the curve (AUC) of receiver operating characteristic curves.
The classifiers performed well on both datasets. In the multi-type dataset, SVM with a linear kernel performed best, identifying 85.8% of SAC level 1 incidents (precision=0.88, recall=0.83, F-measure=0.86, AUC=0.92). In the patient misidentification dataset, 96.4% of SAC level 1 incidents were detected when SVM with linear, polynomial or radial-basis function kernel was used (precision=0.99, recall=0.94, F-measure=0.96, AUC=0.98). Naïve Bayes showed reasonable performance, detecting 80.8% of SAC level 1 incidents in the multi-type dataset and 89.8% of SAC level 1 patient misidentification incidents. Overall, higher prediction accuracy was attained on the specialized dataset, compared with the multi-type dataset.
Text classification techniques can be applied effectively to automate the detection of extreme-risk events in clinical incident reports.
探索使用统计文本分类自动检测临床事件报告中极端风险事件的可行性。
基于朴素贝叶斯和支持向量机(SVM)算法的统计文本分类器在临床事件报告上进行训练和测试,以自动检测极端风险事件,这些事件由满足严重度评估代码(SAC)级别 1 标准的事件定义。为此,使用来自澳大利亚一个地区公立医院提交给高级事件管理系统的事件报告。该分类器在两个数据集上进行了评估:(1)一组具有不同事件类型的报告(n=120);(2)一组与患者身份识别错误相关的报告(n=166)。使用准确性、精度、召回率、F1 度量和接收器工作特征曲线下的面积(AUC)来评估结果。
分类器在两个数据集上表现良好。在线性核 SVM 中,SVM 在线性核 SVM 中表现最佳,识别出 85.8%的 SAC 级别 1 事件(精度=0.88,召回率=0.83,F1 度量=0.86,AUC=0.92)。在患者身份识别错误数据集上,当使用线性、多项式或径向基函数核 SVM 时,检测到 96.4%的 SAC 级别 1 事件(精度=0.99,召回率=0.94,F1 度量=0.96,AUC=0.98)。朴素贝叶斯表现出合理的性能,在多类型数据集上检测到 80.8%的 SAC 级别 1 事件,在 SAC 级别 1 患者身份识别错误事件中检测到 89.8%。总体而言,与多类型数据集相比,在专门化数据集上实现了更高的预测准确性。
文本分类技术可有效应用于自动化检测临床事件报告中的极端风险事件。