Suppr超能文献

临床记录中编号的复杂性、变化性和错误:对信息提取和队列识别的潜在影响。

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification.

机构信息

Department of Pediatrics, University of Michigan, Ann Arbor, MI, 48109, USA.

School of Information, University of Michigan, Ann Arbor, MI, 48109, USA.

出版信息

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):75. doi: 10.1186/s12911-019-0784-1.

Abstract

BACKGROUND

Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes.

METHODS

We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed.

RESULTS

We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients.

CONCLUSIONS

Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.

摘要

背景

电子健康记录中的自由文本临床记录中经常出现数字和数字概念。了解这些数字概念的常见词汇变化及其准确识别对于许多信息提取任务非常重要。本文描述了对数字和数字概念在临床记录中的表示方式的变化进行的分析。

方法

我们使用了大约 1 亿条记录的倒排索引来获取数字和数字概念的各种排列的频率,包括使用罗马数字、拼写为英语单词的数字以及无效日期等。总共分析了 12 种词汇变体。

结果

我们发现这些概念在记录中的表示方式存在很大差异,包括多个数据质量问题。我们还证明,如果不考虑这些变化,对于队列识别任务可能会产生实质性的现实影响,在一个案例中,超过 80%的潜在患者被遗漏。

结论

临床记录中的编号可能会有所不同,如果不考虑这些变化,可能会导致自然语言处理和信息检索任务中丢失或不准确的信息。

相似文献

引用本文的文献

本文引用的文献

4
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验