评估重采样方法和结构化特征，以按严重程度改进跌倒事件报告识别。

Evaluating resampling methods and structured features to improve fall incident report identification by the severity level.

作者信息

Liu Jiaxing, Wong Zoie S Y, So H Y, Tsui Kwok Leung

机构信息

School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China.

School of Data Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China.

出版信息

J Am Med Inform Assoc. 2021 Jul 30;28(8):1756-1764. doi: 10.1093/jamia/ocab048.

DOI:10.1093/jamia/ocab048

PMID:34010385

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8324236/

Abstract

OBJECTIVE

This study aims to improve the classification of the fall incident severity level by considering data imbalance issues and structured features through machine learning.

MATERIALS AND METHODS

We present an incident report classification (IRC) framework to classify the in-hospital fall incident severity level by addressing the imbalanced class problem and incorporating structured attributes. After text preprocessing, bag-of-words features, structured text features, and structured clinical features were extracted from the reports. Next, resampling techniques were incorporated into the training process. Machine learning algorithms were used to build classification models. IRC systems were trained, validated, and tested using a repeated and randomly stratified shuffle-split cross-validation method. Finally, we evaluated the system performance using the F1-measure, precision, and recall over 15 stratified test sets.

RESULTS

The experimental results demonstrated that the classification system setting considering both data imbalance issues and structured features outperformed the other system settings (with a mean macro-averaged F1-measure of 0.733). Considering the structured features and resampling techniques, this classification system setting significantly improved the mean F1-measure for the rare class by 30.88% (P value < .001) and the mean macro-averaged F1-measure by 8.26% from the baseline system setting (P value < .001). In general, the classification system employing the random forest algorithm and random oversampling method outperformed the others.

CONCLUSIONS

Structured features provide essential information for categorizing the fall incident severity level. Resampling methods help rebalance the class distribution of the original incident report data, which improves the performance of machine learning models. The IRC framework presented in this study effectively automates the identification of fall incident reports by the severity level.

摘要

目的

本研究旨在通过机器学习考虑数据不平衡问题和结构化特征，以改进跌倒事件严重程度级别的分类。

材料与方法

我们提出了一个事件报告分类（IRC）框架，通过解决类不平衡问题并纳入结构化属性来对医院内跌倒事件的严重程度级别进行分类。经过文本预处理后，从报告中提取了词袋特征、结构化文本特征和结构化临床特征。接下来，将重采样技术纳入训练过程。使用机器学习算法构建分类模型。使用重复随机分层洗牌分割交叉验证方法对IRC系统进行训练、验证和测试。最后，我们在15个分层测试集上使用F1值、精确率和召回率评估系统性能。

结果

实验结果表明，同时考虑数据不平衡问题和结构化特征的分类系统设置优于其他系统设置（平均宏平均F1值为0.733）。考虑结构化特征和重采样技术，该分类系统设置使稀有类别的平均F1值从基线系统设置显著提高了30.88%（P值<0.001），平均宏平均F1值提高了8.26%（P值<0.001）。总体而言，采用随机森林算法和随机过采样方法的分类系统表现优于其他系统。

结论

结构化特征为对跌倒事件严重程度级别进行分类提供了重要信息。重采样方法有助于重新平衡原始事件报告数据的类分布，从而提高机器学习模型的性能。本研究中提出的IRC框架有效地实现了按严重程度级别自动识别跌倒事件报告。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估重采样方法和结构化特征，以按严重程度改进跌倒事件报告识别。

Evaluating resampling methods and structured features to improve fall incident report identification by the severity level.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

评估重采样方法和结构化特征，以按严重程度改进跌倒事件报告识别。

Evaluating resampling methods and structured features to improve fall incident report identification by the severity level.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

目的

材料与方法

结果

结论