• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于识别和纠正临床文本中拼写错误的高效原型方法。

An efficient prototype method to identify and correct misspellings in clinical text.

作者信息

Workman T Elizabeth, Shao Yijun, Divita Guy, Zeng-Treitler Qing

机构信息

The George Washington University, Biomedical Informatics Center, 2600 Virginia Ave, Suite 506, Washington, DC, 20037, USA.

Division of Epidemiology, University of Utah School of Medicine, 295 Chipeta Way, Salt Lake City, UT, 84132, USA.

出版信息

BMC Res Notes. 2019 Jan 18;12(1):42. doi: 10.1186/s13104-019-4073-y.

DOI:10.1186/s13104-019-4073-y
PMID:30658682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6339425/
Abstract

OBJECTIVE

Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications.

RESULTS

In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.

摘要

目的

临床自由文本中的拼写错误给自然语言处理带来了挑战。为了识别拼写错误及其纠正方法,我们开发了一种原型拼写分析方法,该方法实现了Word2Vec、莱文斯坦编辑距离约束、词汇资源和语料库词频。我们使用该原型方法处理了两个不同的语料库,即从退伍军人健康管理局资源中提取的外科病理报告以及急诊科病程记录和就诊记录。我们通过测量阳性预测值并使用四种分类方法对假阳性输出进行错误分析来评估性能。我们还使用常见错误分类方法对每个语料库中的拼写错误进行了分析。

结果

在这项总共使用76,786份临床记录的小规模研究中,该原型方法在识别和纠正外科病理报告以及急诊科病程记录和就诊记录中的拼写错误方面,阳性预测值分别达到了0.9057和0.8979。假阳性因语料库而异。两个语料库中的拼写错误类型相似,然而,急诊科病程记录和就诊记录的作者所犯错误数量是前者的四倍多。总体而言,本研究结果表明该方法在识别其他临床文档类型中的拼写错误方面也能充分发挥作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5d6/6339425/423df905757a/13104_2019_4073_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5d6/6339425/423df905757a/13104_2019_4073_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5d6/6339425/423df905757a/13104_2019_4073_Fig1_HTML.jpg

相似文献

1
An efficient prototype method to identify and correct misspellings in clinical text.一种用于识别和纠正临床文本中拼写错误的高效原型方法。
BMC Res Notes. 2019 Jan 18;12(1):42. doi: 10.1186/s13104-019-4073-y.
2
Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis.评估在普通病历语料库中进行大规模自然语言处理的可行性:词汇分析
Proc AMIA Annu Fall Symp. 1997:580-4.
3
A Proficient Spelling Analysis Method Applied to a Pharmacovigilance Task.一种应用于药物警戒任务的高效拼写分析方法。
Stud Health Technol Inform. 2019 Aug 21;264:452-456. doi: 10.3233/SHTI190262.
4
A UMLS-based spell checker for natural language processing in vaccine safety.一种基于统一医学语言系统的疫苗安全性自然语言处理拼写检查器。
BMC Med Inform Decis Mak. 2007 Feb 12;7:3. doi: 10.1186/1472-6947-7-3.
5
Improving Terminology Mapping in Clinical Text with Context-Sensitive Spelling Correction.通过上下文敏感拼写校正改进临床文本中的术语映射
Stud Health Technol Inform. 2017;235:241-245.
6
Identification of misspelled words without a comprehensive dictionary using prevalence analysis.使用流行率分析在没有综合词典的情况下识别拼写错误的单词。
AMIA Annu Symp Proc. 2007 Oct 11;2007:751-5.
7
Matching health information seekers' queries to medical terms.匹配健康信息搜索者的查询与医学术语。
BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S11. doi: 10.1186/1471-2105-13-S14-S11. Epub 2012 Sep 7.
8
An unsupervised and customizable misspelling generator for mining noisy health-related text sources.一种用于挖掘噪声健康相关文本源的无监督和可定制的拼写错误生成器。
J Biomed Inform. 2018 Dec;88:98-107. doi: 10.1016/j.jbi.2018.11.007. Epub 2018 Nov 13.
9
Statistical semantic and clinician confidence analysis for correcting abbreviations and spelling errors in clinical progress notes.统计语义和临床医生信心分析在临床进展记录中纠正缩写和拼写错误。
Artif Intell Med. 2011 Nov;53(3):171-80. doi: 10.1016/j.artmed.2011.08.003. Epub 2011 Sep 15.
10
Automated misspelling detection and correction in clinical free-text records.临床自由文本记录中的自动拼写错误检测与纠正
J Biomed Inform. 2015 Jun;55:188-95. doi: 10.1016/j.jbi.2015.04.008. Epub 2015 Apr 24.

引用本文的文献

1
Development of a 3-Step theory of suicide ontology to facilitate 3ST factor extraction from clinical progress notes.自杀本体论三步理论的发展,以促进从临床病程记录中提取3ST因素。
J Biomed Inform. 2024 Feb;150:104582. doi: 10.1016/j.jbi.2023.104582. Epub 2023 Dec 30.
2
MLM-based typographical error correction of unstructured medical texts for named entity recognition.基于 MLM 的非结构化医疗文本命名实体识别的排版错误校正。
BMC Bioinformatics. 2022 Nov 16;23(1):486. doi: 10.1186/s12859-022-05035-9.
3
Automatic Correction of Real-Word Errors in Spanish Clinical Texts.

本文引用的文献

1
How many medication orders are entered through free-text in EHRs?--a study on hypoglycemic agents.通过电子健康记录中的自由文本录入了多少用药医嘱?——一项关于降糖药的研究。
AMIA Annu Symp Proc. 2012;2012:1079-88. Epub 2012 Nov 3.
2
Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注:机构之间的壁垒还是桥梁?
AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.
3
Integrating existing natural language processing tools for medication extraction from discharge summaries.整合现有的自然语言处理工具,从出院小结中提取药物信息。
西班牙语临床文本中真实错误的自动纠正。
Sensors (Basel). 2021 Apr 21;21(9):2893. doi: 10.3390/s21092893.
4
CAS: corpus of clinical cases in French.法语临床病例语料库。
J Biomed Semantics. 2020 Aug 6;11(1):7. doi: 10.1186/s13326-020-00225-x.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):528-31. doi: 10.1136/jamia.2010.003855.
4
A frequency-based technique to improve the spelling suggestion rank in medical queries.一种基于频率的技术,用于提高医学查询中的拼写建议排名。
J Am Med Inform Assoc. 2004 May-Jun;11(3):179-85. doi: 10.1197/jamia.M1474. Epub 2004 Feb 5.
5
VistA--U.S. Department of Veterans Affairs national-scale HIS.退伍军人健康信息系统——美国退伍军人事务部的全国性医疗信息系统。
Int J Med Inform. 2003 Mar;69(2-3):135-56. doi: 10.1016/s1386-5056(02)00131-4.
6
Looking back or looking all around: comparing two spell checking strategies for documents edition in an electronic patient record.回顾还是全面审视:比较电子病历文档编辑中的两种拼写检查策略
Proc AMIA Symp. 2001:568-72.
7
Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis.评估在普通病历语料库中进行大规模自然语言处理的可行性:词汇分析
Proc AMIA Annu Fall Symp. 1997:580-4.
8
UMLS knowledge for biomedical language processing.用于生物医学语言处理的统一医学语言系统知识。
Bull Med Libr Assoc. 1993 Apr;81(2):184-94.
9
The measurement of observer agreement for categorical data.分类数据观察者一致性的测量。
Biometrics. 1977 Mar;33(1):159-74.