Department of Environmental and Preventive Sciences, University of Ferrara, Ferrara, Italy.
Department of Women's and Children's Health, University of Padova, Padua, Italy.
JMIR Public Health Surveill. 2023 Jul 12;9:e44467. doi: 10.2196/44467.
Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machine learning techniques (MLTs) are powerful tools for automatic text classification. The MLT system is useful to improve injury surveillance by speeding up the manual free-text coding tasks of ED diagnoses.
This research aims to develop a tool for automatic free-text classification of ED diagnoses to automatically identify injury cases. The automatic classification system also serves for epidemiological purposes to identify the burden of pediatric injuries in Padua, a large province in the Veneto region in the Northeast Italy.
The study includes 283,468 pediatric admissions between 2007 and 2018 to the Padova University Hospital ED, a large referral center in Northern Italy. Each record reports a diagnosis by free text. The records are standard tools for reporting patient diagnoses. An expert pediatrician manually classified a randomly extracted sample of approximately 40,000 diagnoses. This study sample served as the gold standard to train an MLT classifier. After preprocessing, a document-term matrix was created. The machine learning classifiers, including decision tree, random forest, gradient boosting method (GBM), and support vector machine (SVM), were tuned by 4-fold cross-validation. The injury diagnoses were classified into 3 hierarchical classification tasks, as follows: injury versus noninjury (task A), intentional versus unintentional injury (task B), and type of unintentional injury (task C), according to the World Health Organization classification of injuries.
The SVM classifier achieved the highest performance accuracy (94.14%) in classifying injury versus noninjury cases (task A). The GBM method produced the best results (92% accuracy) for the unintentional and intentional injury classification task (task B). The highest accuracy for the unintentional injury subclassification (task C) was achieved by the SVM classifier. The SVM, random forest, and GBM algorithms performed similarly against the gold standard across different tasks.
This study shows that MLTs are promising techniques for improving epidemiological surveillance, allowing for the automatic classification of pediatric ED free-text diagnoses. The MLTs revealed a suitable classification performance, especially for general injuries and intentional injury classification. This automatic classification could facilitate the epidemiological surveillance of pediatric injuries by also reducing the health professionals' efforts in manually classifying diagnoses for research purposes.
意外伤害是导致儿童死亡的主要原因。急诊科(ED)诊断是伤害流行病学监测的有用信息来源。然而,ED 数据收集系统通常使用自由文本字段报告患者诊断。机器学习技术(MLT)是自动文本分类的强大工具。该 MLT 系统通过加快 ED 诊断的手动自由文本编码任务,有助于改进伤害监测。
本研究旨在开发一种用于自动分类 ED 诊断的自由文本的工具,以自动识别伤害病例。自动分类系统还可用于流行病学目的,以确定意大利东北部威尼托地区一个大省帕多瓦的儿科伤害负担。
该研究包括 2007 年至 2018 年期间帕多瓦大学医院 ED 收治的 283468 例儿科住院患者,这是一家大型转诊中心。每个记录都通过自由文本报告一个诊断。这些记录是报告患者诊断的标准工具。一名儿科专家对大约 40000 个诊断进行了随机抽取样本的手动分类。该研究样本作为训练 MLT 分类器的金标准。经过预处理,创建了一个文档-术语矩阵。机器学习分类器包括决策树、随机森林、梯度提升方法(GBM)和支持向量机(SVM),通过 4 折交叉验证进行调整。根据世界卫生组织的伤害分类,将伤害诊断分为 3 个层次分类任务,如下所示:伤害与非伤害(任务 A)、故意伤害与非故意伤害(任务 B)和非故意伤害类型(任务 C)。
SVM 分类器在分类伤害与非伤害病例(任务 A)方面表现出最高的性能准确性(94.14%)。GBM 方法在非故意伤害和故意伤害分类任务(任务 B)中产生了最佳结果(准确率 92%)。SVM 分类器在非故意伤害亚分类(任务 C)方面达到了最高的准确性。SVM、随机森林和 GBM 算法在不同任务中对金标准的表现相似。
本研究表明,MLT 是改进流行病学监测的有前途的技术,允许对儿科 ED 自由文本诊断进行自动分类。MLT 表现出了合适的分类性能,特别是对于一般伤害和故意伤害分类。这种自动分类可以通过减少卫生专业人员在手动分类诊断以用于研究目的方面的工作,促进儿科伤害的流行病学监测。