• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床记录中统一医学语言系统术语的出现:大规模语料库分析。

Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.

机构信息

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA.

出版信息

J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.

DOI:10.1136/amiajnl-2011-000744
PMID:22493050
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3392861/
Abstract

OBJECTIVE

To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources.

DESIGN

Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' string attributes, source terminologies, semantic types and syntactic categories. Term occurrences in 2010 i2b2/VA text were also mapped; eight example filters were designed from the Mayo-based statistics and applied to i2b2/VA data.

RESULTS

For the corpus analysis, negligible numbers of mapped terms in the Mayo corpus had over six words or 55 characters. Of source terminologies in the UMLS, the Consumer Health Vocabulary and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) had the best coverage in Mayo clinical notes at 106426 and 94788 unique terms, respectively. Of 15 semantic groups in the UMLS, seven groups accounted for 92.08% of term occurrences in Mayo data. Syntactically, over 90% of matched terms were in noun phrases. For the cross-institutional analysis, using five example filters on i2b2/VA data reduces the actual lexicon to 19.13% of the size of the UMLS and only sees a 2% reduction in matched terms.

CONCLUSION

The corpus statistics presented here are instructive for building lexicons from the UMLS. Features intrinsic to Metathesaurus terms (well formedness, length and language) generalise easily across clinical institutions, but term frequencies should be adapted with caution. The semantic groups of mapped terms may differ slightly from institution to institution, but they differ greatly when moving to the biomedical literature domain.

摘要

目的

在大型临床语料库中描述统一医学语言系统 (UMLS) 元词表术语字符串的经验实例,并说明哪些类型的术语特征可在数据源之间推广。

设计

基于 Mayo 诊所临床笔记 5100 万篇文档语料库中 UMLS 术语的出现情况,本研究计算了术语字符串属性、源术语表、语义类型和语法类别方面的统计信息。还对 2010 年 i2b2/VA 文本中的术语出现情况进行了映射;从 Mayo 基于统计的基础上设计了 8 个示例过滤器,并将其应用于 i2b2/VA 数据。

结果

对于语料库分析,在 Mayo 语料库中,映射的术语数量很少超过六个单词或 55 个字符。在 UMLS 的源术语表中,消费者健康词汇和系统命名法医学临床术语 (SNOMED-CT) 在 Mayo 临床笔记中的覆盖率最高,分别为 106426 和 94788 个唯一术语。在 UMLS 的 15 个语义组中,有 7 个组占 Mayo 数据中术语出现的 92.08%。从语法上看,超过 90%的匹配术语都在名词短语中。对于跨机构分析,在 i2b2/VA 数据上使用五个示例过滤器将实际词汇减少到 UMLS 大小的 19.13%,而匹配术语仅减少了 2%。

结论

这里提出的语料库统计信息对于从 UMLS 构建词汇表很有启发性。元词表术语固有的特征(完整性、长度和语言)很容易在临床机构之间推广,但术语频率应该谨慎调整。映射术语的语义组在不同机构之间可能略有不同,但在转移到生物医学文献领域时差异很大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/7c03a9447f59/amiajnl-2011-000744fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/2e5652889280/amiajnl-2011-000744fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/92d5488c46fd/amiajnl-2011-000744fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/979572d4379f/amiajnl-2011-000744fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/e9a5c1898137/amiajnl-2011-000744fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/e381503a95b7/amiajnl-2011-000744fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/7c03a9447f59/amiajnl-2011-000744fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/2e5652889280/amiajnl-2011-000744fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/92d5488c46fd/amiajnl-2011-000744fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/979572d4379f/amiajnl-2011-000744fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/e9a5c1898137/amiajnl-2011-000744fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/e381503a95b7/amiajnl-2011-000744fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/3392861/7c03a9447f59/amiajnl-2011-000744fig6.jpg

相似文献

1
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.临床记录中统一医学语言系统术语的出现:大规模语料库分析。
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.
2
Towards a semantic lexicon for clinical natural language processing.迈向用于临床自然语言处理的语义词典。
AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.
3
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
4
A semantic lexicon for medical language processing.用于医学语言处理的语义词典。
J Am Med Inform Assoc. 1999 May-Jun;6(3):205-18. doi: 10.1136/jamia.1999.0060205.
5
Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.使用机器学习和C值方法从出院小结中提取语义词典。
AMIA Annu Symp Proc. 2012;2012:409-16. Epub 2012 Nov 3.
6
A technique for semantic classification of unknown words using UMLS resources.一种使用统一医学语言系统(UMLS)资源对未知单词进行语义分类的技术。
Proc AMIA Symp. 1999:716-20.
7
A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations.使用一千八百万条MEDLINE引文对五百万条统一医学语言系统(UMLS)元词表术语进行的综合分析。
AMIA Annu Symp Proc. 2010 Nov 13;2010:907-11.
8
MedLexSp - a medical lexicon for Spanish medical natural language processing.MedLexSp- 西班牙语医学自然语言处理的医学词典。
J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.
9
Assisting the translation of SNOMED CT into French using UMLS and four representative French-language terminologies.利用统一医学语言系统(UMLS)和四种具有代表性的法语术语集协助将医学系统命名法(SNOMED CT)翻译成法语。
AMIA Annu Symp Proc. 2009 Nov 14;2009:291-5.
10
Assessing the consistency of a biomedical terminology through lexical knowledge.通过词汇知识评估生物医学术语的一致性。
Int J Med Inform. 2002 Dec 4;67(1-3):85-95. doi: 10.1016/s1386-5056(02)00051-5.

引用本文的文献

1
Named Entity Recognition of Medical Text Based on the Deep Neural Network.基于深度神经网络的医学文本命名实体识别
J Healthc Eng. 2022 Mar 7;2022:3990563. doi: 10.1155/2022/3990563. eCollection 2022.
2
Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care.不同的自然语言处理分析笔记准备方法对重症监护预测模型性能的影响
Crit Care Explor. 2021 Jun 11;3(6):e0450. doi: 10.1097/CCE.0000000000000450. eCollection 2021 Jun.
3
Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder.

本文引用的文献

1
Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.临床笔记与生物医学文献中自然语言处理提取概念的语义特征。
AMIA Annu Symp Proc. 2011;2011:1550-8. Epub 2011 Oct 22.
2
The BioLexicon: a large-scale terminological resource for biomedical text mining.生物词典:一个用于生物医学文本挖掘的大规模术语资源。
BMC Bioinformatics. 2011 Oct 12;12:397. doi: 10.1186/1471-2105-12-397.
3
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
自然语言处理(NLP)工具在从研究文章中提取生物医学概念中的应用:以自闭症谱系障碍为例。
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322. doi: 10.1186/s12911-020-01352-2.
4
Prediction of severe chest injury using natural language processing from the electronic health record.利用电子健康记录中的自然语言处理预测严重胸部损伤。
Injury. 2021 Feb;52(2):205-212. doi: 10.1016/j.injury.2020.10.094. Epub 2020 Oct 25.
5
Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research.电子病历搜索引擎 (EMERSE):支持癌症研究的信息检索工具。
JCO Clin Cancer Inform. 2020 May;4:454-463. doi: 10.1200/CCI.19.00134.
6
Predicting Future Cardiovascular Events in Patients With Peripheral Artery Disease Using Electronic Health Record Data.利用电子健康记录数据预测外周动脉疾病患者未来的心血管事件
Circ Cardiovasc Qual Outcomes. 2019 Mar;12(3):e004741. doi: 10.1161/CIRCOUTCOMES.118.004741.
7
Empirical advances with text mining of electronic health records.电子健康记录文本挖掘的实证进展。
BMC Med Inform Decis Mak. 2017 Aug 22;17(1):127. doi: 10.1186/s12911-017-0519-0.
8
A Clinical Score for Predicting Atrial Fibrillation in Patients with Cryptogenic Stroke or Transient Ischemic Attack.预测隐源性卒中或短暂性脑缺血发作患者发生心房颤动的临床评分
Cardiology. 2017;138(3):133-140. doi: 10.1159/000476030. Epub 2017 Jun 28.
9
The utility of including pathology reports in improving the computational identification of patients.纳入病理报告在改善患者的计算识别方面的效用。
J Pathol Inform. 2016 Nov 29;7:46. doi: 10.4103/2153-3539.194838. eCollection 2016.
10
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.通过网络搜索行为预测医疗保健利用率的影响:数据驱动分析
J Med Internet Res. 2016 Sep 21;18(9):e251. doi: 10.2196/jmir.6240.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
4
A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations.使用一千八百万条MEDLINE引文对五百万条统一医学语言系统(UMLS)元词表术语进行的综合分析。
AMIA Annu Symp Proc. 2010 Nov 13;2010:907-11.
5
The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies.词汇构建器网络服务:从两百个生物医学本体构建自定义词汇表。
AMIA Annu Symp Proc. 2010 Nov 13;2010:587-91.
6
Quantitative analysis of culture using millions of digitized books.利用数百万本数字化书籍进行文化的定量分析。
Science. 2011 Jan 14;331(6014):176-82. doi: 10.1126/science.1199644. Epub 2010 Dec 16.
7
The structural and content aspects of abstracts versus bodies of full text journal articles are different.文摘的结构和内容方面与全文期刊文章的不同。
BMC Bioinformatics. 2010 Sep 29;11:492. doi: 10.1186/1471-2105-11-492.
8
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
9
Rewriting and suppressing UMLS terms for improved biomedical term identification.重写和抑制统一医学语言系统术语以改进生物医学术语识别。
J Biomed Semantics. 2010 Mar 31;1(1):5. doi: 10.1186/2041-1480-1-5.
10
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.