University of North Carolina at Charlotte, Department of Business Information Systems and Operations Management, United States.
University of Maryland School of Medicine, Department of Medicine, United States.
Int J Med Inform. 2019 Sep;129:374-380. doi: 10.1016/j.ijmedinf.2019.06.020. Epub 2019 Jul 9.
Hypoglycemia is a common safety event when attempting to optimize glycemic control in diabetes (DM). While electronic medical records provide a natural ground for detecting and analyzing hypoglycemia, ICD codes used in the databases may be invalid, insensitive or non-specific in detecting new hypoglycemic events. We developed text preprocessing methods to improve automatic detection of hypoglycemia from analysis of clinical encounter text notes.
We set out to improve hypoglycemia detection from clinical notes by introducing three preprocessing methods: stop word filtering, medication signaling, and ICD narrative enrichment. To test the proposed methods, we selected clinical notes from VA Maryland Healthcare System, based on various combinations of three criteria that are suggestive of hypoglycemia, including ICD-9 code of diabetes and hypoglycemia, laboratory glucose values < 70 md/dL, and text reference to a proximate hypoglycemia event. In addition, we constructed one dataset of 395 clinical notes from year 2009 and another of 460 notes from year 2014 to test the generality of the proposed methods. For each of the datasets, two physician judges manually reviewed individual clinical notes to determine whether hypoglycemia was present or absent. A third physician judge served as a final adjudicator for disagreements.
Each of the proposed preprocessing methods contributed to the performance of hypoglycemia detection by significantly increasing the F1 score in the range of 5.3∼7.4% on one dataset (p < .01). Among the methods, stop word filtering contributed most to the performance improvement (7.4%). Combining all the preprocessing methods led to greater performance gain (p < .001) compared with using each method individually. Similar patterns were observed for the other dataset with the F1 score being increased in the range of 7.7%∼9.4% by individual methods (p < .001). Nevertheless, combining the three methods did not yield additional performance gain.
The proposed text preprocessing methods improved the performance of hypoglycemia detection from clinical text notes. Stop word filtering achieved the most performance improvement. ICD narrative enrichment boosted the recall of detection. Combining the three preprocessing methods led to additional performance gains.
在试图优化糖尿病(DM)患者的血糖控制时,低血糖是一种常见的安全事件。虽然电子病历为检测和分析低血糖提供了一个自然的基础,但数据库中使用的 ICD 代码在检测新的低血糖事件时可能是无效的、不敏感的或非特异性的。我们开发了文本预处理方法,以通过分析临床就诊文本记录来提高低血糖的自动检测。
我们着手通过引入三种预处理方法来提高从临床记录中检测低血糖的能力:停用词过滤、药物信号和 ICD 叙述丰富。为了测试所提出的方法,我们根据暗示低血糖的三个标准的各种组合,从弗吉尼亚州马里兰医疗保健系统的临床记录中选择了临床记录,这三个标准包括糖尿病和低血糖的 ICD-9 代码、实验室血糖值<70mg/dL 以及与近期低血糖事件的文本参考。此外,我们构建了一个由 2009 年 395 份临床记录组成的数据集和另一个由 2014 年 460 份记录组成的数据集,以测试所提出的方法的通用性。对于每个数据集,两名医师评审员都手动审阅了各个临床记录,以确定是否存在低血糖。第三名医师评审员作为有分歧的最终裁决者。
所提出的预处理方法中的每一种方法都通过在一个数据集(p<.01)的范围内将 F1 分数显著提高 5.3%7.4%,从而有助于低血糖检测的性能。在这些方法中,停用词过滤对性能的提高贡献最大(7.4%)。与单独使用每种方法相比,组合使用所有预处理方法可带来更大的性能提升(p<.001)。对于另一个数据集,通过单独使用每种方法(p<.001),F1 分数在 7.7%9.4%的范围内得到提高,观察到类似的模式。然而,组合这三种方法并没有带来额外的性能提升。
所提出的文本预处理方法提高了从临床文本记录中检测低血糖的性能。停用词过滤实现了最高的性能提升。ICD 叙述丰富提高了检测的召回率。组合三种预处理方法可带来额外的性能提升。