• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用放射科报告中的上下文模式扩展放射学词汇。

Expanding a radiology lexicon using contextual patterns in radiology reports.

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA.

出版信息

J Am Med Inform Assoc. 2018 Jun 1;25(6):679-685. doi: 10.1093/jamia/ocx152.

DOI:10.1093/jamia/ocx152
PMID:29329435
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5978019/
Abstract

OBJECTIVE

Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.

MATERIALS AND METHODS

We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.

RESULTS

Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.

DISCUSSION

The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.

摘要

目的

分布语义算法通过从大型语料库中学习单词和短语的向量空间表示,根据上下文使用模式识别相关术语。我们假设分布语义可以通过从语料库中挖掘同义词来加速临床领域(放射学)的词汇扩展。

材料和方法

我们将 word2vec(一种分布语义软件包)应用于放射学笔记的文本中,以识别 RadLex(放射学术语的结构化词汇)的同义词。我们根据术语类别、术语频率、术语中的标记数量、向量幅度以及用于构建向量的上下文窗口对性能进行分层。

结果

根据与目标术语的分布相似性对候选术语进行排序会产生很高的编校效率:在 775249 个术语的排名列表中,超过 50%的同义词出现在前 25 个术语中。如果目标术语是短语而不是单个单词,如果它在语料库中至少出现 100 次,并且其向量幅度在 4 到 5 之间,则更容易找到同义词。一些 RadLex 类别,如解剖物质,比其他类别更容易识别同义词。

讨论

临床笔记的非结构化文本包含有关人类疾病和治疗模式的大量信息。然而,由于文本中描述相似概念的方式存在差异,因此从临床笔记中搜索和检索信息往往会遇到困难。生物医学词汇表解决了这一挑战,但制作和维护成本很高。分布语义算法可以辅助词汇编校,为研究人员节省时间和金钱。

相似文献

1
Expanding a radiology lexicon using contextual patterns in radiology reports.利用放射科报告中的上下文模式扩展放射学词汇。
J Am Med Inform Assoc. 2018 Jun 1;25(6):679-685. doi: 10.1093/jamia/ocx152.
2
Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble.基于文本相似度集成的中文结构化放射学报告的自动 RadLex 编码。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):247. doi: 10.1186/s12911-021-01604-9.
3
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.比较ARRS GoldMiner搜索引擎与临床PACS/RIS中的图像搜索行为。
J Biomed Inform. 2015 Aug;56:57-64. doi: 10.1016/j.jbi.2015.04.013. Epub 2015 May 19.
4
Towards a semantic lexicon for clinical natural language processing.迈向用于临床自然语言处理的语义词典。
AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.
5
Proposing New RadLex Terms by Analyzing Free-Text Mammography Reports.通过分析自由文本乳腺 X 线报告来提出新的 RadLex 术语。
J Digit Imaging. 2018 Oct;31(5):596-603. doi: 10.1007/s10278-018-0064-0.
6
An Approach for Automatic Classification of Radiology Reports in Spanish.一种用于西班牙语放射学报告自动分类的方法。
Stud Health Technol Inform. 2015;216:634-8.
7
Identifying synonymy between relational phrases using word embeddings.使用词嵌入识别关系短语之间的同义关系。
J Biomed Inform. 2015 Aug;56:94-102. doi: 10.1016/j.jbi.2015.05.010. Epub 2015 May 22.
8
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
9
An ontology-based similarity measure for biomedical data-application to radiology reports.基于本体的生物医学数据相似度测量-在放射学报告中的应用。
J Biomed Inform. 2013 Oct;46(5):857-68. doi: 10.1016/j.jbi.2013.06.013. Epub 2013 Jul 11.
10
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

引用本文的文献

1
Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation.利用框架语义增强来自变换器的双向编码器表征(BERT)以从德国乳腺钼靶报告中提取临床相关信息:算法开发与验证
J Med Internet Res. 2025 Apr 25;27:e68427. doi: 10.2196/68427.
2
Natural Language Processing for Breast Imaging: A Systematic Review.用于乳腺成像的自然语言处理:一项系统综述。
Diagnostics (Basel). 2023 Apr 14;13(8):1420. doi: 10.3390/diagnostics13081420.
3
Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports.使用放射学报告对语境和非语境词嵌入进行内在评估。
AMIA Annu Symp Proc. 2022 Feb 21;2021:631-640. eCollection 2021.
4
Biomedical Ontologies to Guide AI Development in Radiology.生物医学本体在放射学中的人工智能开发中的指导作用。
J Digit Imaging. 2021 Dec;34(6):1331-1341. doi: 10.1007/s10278-021-00527-1. Epub 2021 Nov 1.
5
A systematic review of natural language processing applied to radiology reports.自然语言处理在放射学报告中的应用的系统评价。
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.
6
Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings.基于全自动跨语言 RadLex 映射的计算机辅助报告机器学习算法的比较分析。
Sci Rep. 2021 Mar 9;11(1):5529. doi: 10.1038/s41598-021-85016-9.
7
LexExp: a system for automatically expanding concept lexicons for noisy biomedical texts.LexExp:一个用于自动扩展含噪生物医学文本概念词典的系统。
Bioinformatics. 2021 Aug 25;37(16):2499-2501. doi: 10.1093/bioinformatics/btaa995.
8
Ontology-Based Radiology Teaching File Summarization, Coverage, and Integration.基于本体论的放射学教学文件总结、涵盖范围和整合。
J Digit Imaging. 2020 Jun;33(3):797-813. doi: 10.1007/s10278-020-00331-3.
9
Using word embeddings to expand terminology of dietary supplements on clinical notes.利用词嵌入技术扩展临床记录中膳食补充剂的术语。
JAMIA Open. 2019 Jul;2(2):246-253. doi: 10.1093/jamiaopen/ooz007. Epub 2019 Mar 28.
10
Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study.是否可以使用自然语言处理自动评估直肠指检前的文档记录?一项单中心回顾性研究。
BMJ Open. 2019 Jul 18;9(7):e027182. doi: 10.1136/bmjopen-2018-027182.

本文引用的文献

1
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
2
The digital revolution in phenotyping.表型分析中的数字革命。
Brief Bioinform. 2016 Sep;17(5):819-30. doi: 10.1093/bib/bbv083. Epub 2015 Sep 29.
3
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
4
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
5
Evaluating word representation features in biomedical named entity recognition tasks.评估生物医学命名实体识别任务中的词表示特征。
Biomed Res Int. 2014;2014:240403. doi: 10.1155/2014/240403. Epub 2014 Mar 6.
6
The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies.语义度量库和工具包:使用生物医学本体快速计算语义相似度和相关性。
Bioinformatics. 2014 Mar 1;30(5):740-2. doi: 10.1093/bioinformatics/btt581. Epub 2013 Oct 9.
7
A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.当前临床自然语言处理系统在处理出院小结中缩写词方面的比较研究。
AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.
8
ChemSpot: a hybrid system for chemical named entity recognition.ChemSpot:一种用于化学命名实体识别的混合系统。
Bioinformatics. 2012 Jun 15;28(12):1633-40. doi: 10.1093/bioinformatics/bts183. Epub 2012 Apr 12.
9
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务:文章的分类/排序和将生物本体论概念链接到全文。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.
10
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.