Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padova, Italy.
Division of Pediatric Emergency Medicine, Department of Women's and Children's Health, University of Padova, Padova, Italy.
Med Care Res Rev. 2021 Apr;78(2):138-145. doi: 10.1177/1077558719844123. Epub 2019 Apr 29.
Free-text information is still widely used in emergency department (ED) records. Machine learning techniques are useful for analyzing narratives, but they have been used mostly for English-language data sets. Considering such a framework, the performance of an ML classification task of a Spanish-language ED visits database was tested. ED visits collected in the EDs of nine hospitals in Nicaragua were analyzed. Spanish-language, free-text discharge diagnoses were considered in the analysis. Five-hundred random forests were trained on a set of bootstrap samples of the whole data set (1,789 ED visits) to perform the classification task. For each one, after having identified optimal parameter value, the final validated model was trained on the whole bootstrapped data set and tested. The classification accuracies had a median of 0.783 (95% CI [0.779, 0.796]). Machine learning techniques seemed to be a promising opportunity for the exploitation of unstructured information reported in ED records in low- and middle-income Spanish-speaking countries.
自由文本信息在急诊科 (ED) 记录中仍被广泛使用。机器学习技术可用于分析叙述,但它们主要用于英语数据集。考虑到这样的框架,测试了西班牙语 ED 就诊数据库的 ML 分类任务的性能。分析了在尼加拉瓜九家医院的急诊科收集的 ED 就诊数据。在分析中考虑了西班牙语的自由文本出院诊断。在整个数据集(1789 次 ED 就诊)的一组自举样本上训练了 500 个随机森林来执行分类任务。对于每个样本,在确定了最佳参数值后,最终的验证模型在整个自举数据集中进行训练和测试。分类准确性的中位数为 0.783(95%CI[0.779,0.796])。机器学习技术似乎为利用中低收入西班牙语国家的 ED 记录中报告的非结构化信息提供了一个有前途的机会。