• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床记录中编号的复杂性、变化性和错误:对信息提取和队列识别的潜在影响。

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification.

机构信息

Department of Pediatrics, University of Michigan, Ann Arbor, MI, 48109, USA.

School of Information, University of Michigan, Ann Arbor, MI, 48109, USA.

出版信息

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):75. doi: 10.1186/s12911-019-0784-1.

DOI:10.1186/s12911-019-0784-1
PMID:30944012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6448181/
Abstract

BACKGROUND

Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes.

METHODS

We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed.

RESULTS

We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients.

CONCLUSIONS

Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.

摘要

背景

电子健康记录中的自由文本临床记录中经常出现数字和数字概念。了解这些数字概念的常见词汇变化及其准确识别对于许多信息提取任务非常重要。本文描述了对数字和数字概念在临床记录中的表示方式的变化进行的分析。

方法

我们使用了大约 1 亿条记录的倒排索引来获取数字和数字概念的各种排列的频率,包括使用罗马数字、拼写为英语单词的数字以及无效日期等。总共分析了 12 种词汇变体。

结果

我们发现这些概念在记录中的表示方式存在很大差异,包括多个数据质量问题。我们还证明,如果不考虑这些变化,对于队列识别任务可能会产生实质性的现实影响,在一个案例中,超过 80%的潜在患者被遗漏。

结论

临床记录中的编号可能会有所不同,如果不考虑这些变化,可能会导致自然语言处理和信息检索任务中丢失或不准确的信息。

相似文献

1
Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification.临床记录中编号的复杂性、变化性和错误:对信息提取和队列识别的潜在影响。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):75. doi: 10.1186/s12911-019-0784-1.
2
Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用:意大利医学记录的MetaMap注释可行性研究。
J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.
3
Extracting Sexual Trauma Mentions from Electronic Medical Notes Using Natural Language Processing.使用自然语言处理技术从电子病历中提取性创伤相关表述
Stud Health Technol Inform. 2017;245:351-355.
4
A Study of Concept Extraction Across Different Types of Clinical Notes.不同类型临床记录中的概念提取研究。
AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.
5
Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.使用Apache cTAKES™从德语中提取统一医学语言系统(UMLS®)概念。
Stud Health Technol Inform. 2016;223:71-6.
6
An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating.自然语言处理简介:如何从你正在生成的电子笔记中获取更多信息。
Pediatr Emerg Care. 2015 Jul;31(7):536-41. doi: 10.1097/PEC.0000000000000484.
7
Coreference resolution: a review of general methodologies and applications in the clinical domain.共指消解:综述临床领域的通用方法及应用。
J Biomed Inform. 2011 Dec;44(6):1113-22. doi: 10.1016/j.jbi.2011.08.006. Epub 2011 Aug 12.
8
Learning to identify treatment relations in clinical text.学习识别临床文本中的治疗关系。
AMIA Annu Symp Proc. 2014 Nov 14;2014:282-8. eCollection 2014.
9
EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research.从电子病历中提取数值数据:一种高效且可推广的扩展临床研究的工具。
BMC Med Inform Decis Mak. 2019 Nov 15;19(1):226. doi: 10.1186/s12911-019-0970-1.
10
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

引用本文的文献

1
Enhancing medical coding efficiency through domain-specific fine-tuned large language models.通过特定领域微调的大语言模型提高医学编码效率。
Npj Health Syst. 2025;2(1):14. doi: 10.1038/s44401-025-00018-3. Epub 2025 May 1.
2
Sociotechnical feasibility of natural language processing-driven tools in clinical trial eligibility prescreening for Alzheimer's disease and related dementias.自然语言处理驱动工具在阿尔茨海默病及相关痴呆症临床试验资格预筛选中的社会技术可行性。
J Am Med Inform Assoc. 2024 Apr 19;31(5):1062-1073. doi: 10.1093/jamia/ocae032.
3
A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.应用基于人工智能的命名实体识别技术开发自动化眼科疾病登记系统的案例研究。
Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.
4
Automating Electronic Health Record Data Quality Assessment.自动化电子健康记录数据质量评估。
J Med Syst. 2023 Feb 13;47(1):23. doi: 10.1007/s10916-022-01892-2.
5
Identifying Caregiver Availability Using Medical Notes With Rule-Based Natural Language Processing: Retrospective Cohort Study.使用基于规则的自然语言处理技术通过医学记录识别照顾者可及性:回顾性队列研究
JMIR Aging. 2022 Sep 22;5(3):e40241. doi: 10.2196/40241.
6
An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma.电子健康记录文本挖掘工具收集真实世界药物治疗结局:转移性肾细胞癌患者的验证研究。
Clin Pharmacol Ther. 2020 Sep;108(3):644-652. doi: 10.1002/cpt.1966. Epub 2020 Jul 18.
7
Special issue of BMC medical informatics and decision making on health natural language processing.《BMC医学信息学与决策制定》关于健康自然语言处理的特刊。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):76. doi: 10.1186/s12911-019-0777-0.

本文引用的文献

1
Residents' numeric inputting error in computerized physician order entry prescription.住院医师在计算机医嘱录入处方中的数字输入错误。
Int J Med Inform. 2016 Apr;88:25-33. doi: 10.1016/j.ijmedinf.2016.01.002. Epub 2016 Jan 15.
2
Effects of Atorvastatin on Negative Sign in Chronic Schizophrenia: a Double Blind Clinical Trial.阿托伐他汀对慢性精神分裂症阴性症状的影响:一项双盲临床试验
Iran J Pharm Res. 2015 Fall;14(4):1269-74.
3
Electronic Health Record Adoption In US Hospitals: Progress Continues, But Challenges Persist.美国医院采用电子健康记录:进展仍在继续,但挑战依然存在。
Health Aff (Millwood). 2015 Dec;34(12):2174-80. doi: 10.1377/hlthaff.2015.0992. Epub 2015 Nov 11.
4
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.
5
Supporting information retrieval from electronic health records: A report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE).支持从电子健康记录中检索信息:密歇根大学开发和使用电子病历搜索引擎(EMERSE)九年经验报告。
J Biomed Inform. 2015 Jun;55:290-300. doi: 10.1016/j.jbi.2015.05.003. Epub 2015 May 13.
6
Semantic enrichment of clinical models towards semantic interoperability. The heart failure summary use case.临床模型的语义增强以实现语义互操作性。心力衰竭总结用例。
J Am Med Inform Assoc. 2015 May;22(3):565-76. doi: 10.1093/jamia/ocu013. Epub 2015 Feb 10.
7
An information retrieval system for computerized patient records in the context of a daily hospital practice: the example of the Léon Bérard Cancer Center (France).日常医院诊疗环境下用于计算机化患者记录的信息检索系统:以里昂贝拉尔癌症中心(法国)为例。
Appl Clin Inform. 2014 Mar 5;5(1):191-205. doi: 10.4338/ACI-2013-08-CR-0065. eCollection 2014.
8
Contribution of Clinical Archetypes, and the Challenges, towards Achieving Semantic Interoperability for EHRs.临床原型对实现电子健康记录语义互操作性的贡献及挑战
Healthc Inform Res. 2013 Dec;19(4):286-92. doi: 10.4258/hir.2013.19.4.286. Epub 2013 Dec 31.
9
Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.临床领域的词义消歧:知识丰富和知识贫乏的无监督方法比较。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.
10
Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.电子健康记录驱动的表型分析:挑战、最新进展与展望
J Am Med Inform Assoc. 2013 Dec;20(e2):e206-11. doi: 10.1136/amiajnl-2013-002428.