Partnership for Health IT Patient Safety, ECRI, Plymouth Meeting, Pennsylvania, United States.
Methods Inf Med. 2021 Dec;60(5-06):147-161. doi: 10.1055/s-0041-1735620. Epub 2021 Oct 31.
Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.
The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.
Datasets for 14 different labels (themes) were vectorized using a bag-of-words, , or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.
F score for each optimized model ranged from 0.951 ("Fall") to 0.544 ("Environment"). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.
We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.
患者安全事件报告为系统性安全问题提供了有价值的见解,但要从这些报告中获得见解,需要计算工具来高效地分析大量定性数据。自然语言处理 (NLP) 与预测学习相结合,为评估这些数据并支持患者安全分析师的工作提供了一种自动化方法。
本研究的目的是使用 NLP 和机器学习技术开发一种通用、可扩展和可靠的方法来对事件报告进行分类,以提高患者护理的安全性和质量。
使用词袋、或文档嵌入方法对 14 个不同标签(主题)的数据集进行矢量化,然后通过超参数网格搜索将其应用于一系列分类算法,以得出优化模型。还使用调整后的 F 分数计算分析了与每个主题强烈相关的报告。
每个优化模型的 F 分数范围从 0.951(“跌倒”)到 0.544(“环境”)。对于 14 个标签中的 12 个,词袋方法被证明是最优的,朴素贝叶斯算法在 9 个标签中表现最好。线性支持向量机被证明对于 3 个标签是最优的,XGBoost 对于 14 个标签中的 4 个是最优的。与不太明显的主题相比,具有更明显关联术语的标签表现更好,Pearson 相关系数为 0.634。
我们能够展示一个广泛应用 NLP 和预测建模来对来自多个设施的患者安全报告进行分类的分析管道。该管道允许分析师更快速地识别和构建患者安全数据中包含的信息,从而随着时间的推移增强对该信息的评估和使用。