• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建用于乳腺癌临床试验受试者入选分析的专业词汇库。

Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis.

机构信息

Information Operations and Technology Management, John B. and Lillian E. Neff College of Business and Innovation, The University of Toledo, USA.

Gary W. Rollins College of Business, The University of Tennessee at Chattanooga, USA.

出版信息

Health Informatics J. 2021 Jan-Mar;27(1):1460458221989392. doi: 10.1177/1460458221989392.

DOI:10.1177/1460458221989392
PMID:33535885
Abstract

A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.

摘要

自然语言处理(NLP)应用程序需要复杂的词汇资源来支持其处理目标。在医疗信息学文献中,已经提出了不同的解决方案,如字典查找和 MetaMap,以识别具有多个单词的疾病术语(多词疾病命名实体)。虽然在生物医学领域的蛋白质和基因命名实体的识别方面已经做了很多工作,但在临床试验受试者资格分析中术语的识别和解析方面的研究却很少。在这项研究中,我们开发了一个专门的词汇表,用于改进乳腺癌领域的 NLP 和文本挖掘分析,并通过与系统医学命名法临床术语(SNOMED CT)进行比较来评估它。我们使用混合方法,结合领域专家的知识、来自多个在线词典的术语以及从示例临床试验中挖掘文本。我们的方法使用了 4243 个独特的词汇项,将双词实体匹配提高了 38.6%,将三词实体匹配提高了 41%。我们的词汇表增加了大量新术语,对于根据资格匹配自动将患者与临床试验匹配非常有用。除了临床试验匹配之外,本研究开发的专业词汇表还可以作为未来医疗保健文本挖掘应用的基础。

相似文献

1
Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis.构建用于乳腺癌临床试验受试者入选分析的专业词汇库。
Health Informatics J. 2021 Jan-Mar;27(1):1460458221989392. doi: 10.1177/1460458221989392.
2
MedLexSp - a medical lexicon for Spanish medical natural language processing.MedLexSp- 西班牙语医学自然语言处理的医学词典。
J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.
3
Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.自动检测在线社区文本自然语言处理工具中的故障。
J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.
4
The nature of lexical knowledge.词汇知识的本质。
Methods Inf Med. 1998 Nov;37(4-5):353-60.
5
Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study.基于 RadLex 的命名实体识别工具在挖掘文本放射学报告中的开发:开发和性能评估研究。
J Med Internet Res. 2021 Oct 29;23(10):e25378. doi: 10.2196/25378.
6
Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition.文献中的补充和综合健康信息:其词汇和命名实体识别。
J Am Med Inform Assoc. 2024 Jan 18;31(2):426-434. doi: 10.1093/jamia/ocad216.
7
Building lexicon-based sentiment analysis model for low-resource languages.为低资源语言构建基于词典的情感分析模型。
MethodsX. 2023 Oct 22;11:102460. doi: 10.1016/j.mex.2023.102460. eCollection 2023 Dec.
8
Development of a Lexicon for Pain.疼痛词汇表的编制
Front Digit Health. 2021 Dec 13;3:778305. doi: 10.3389/fdgth.2021.778305. eCollection 2021.
9
Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法(SNOMED CT)在医疗保健中处理自由文本的应用:系统范围综述。
J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.
10
From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.从词汇规律到公理模式,保障生物医学术语和本体的质量。
J Biomed Inform. 2018 Aug;84:59-74. doi: 10.1016/j.jbi.2018.06.008. Epub 2018 Jun 14.

引用本文的文献

1
Artificial intelligence for optimizing recruitment and retention in clinical trials: a scoping review.人工智能在临床试验中优化招募和保留的应用:范围综述。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2749-2759. doi: 10.1093/jamia/ocae243.
2
Text Classification of Cancer Clinical Trial Eligibility Criteria.癌症临床试验入选标准的文本分类。
AMIA Annu Symp Proc. 2024 Jan 11;2023:1304-1313. eCollection 2023.
3
Mental health at different stages of cancer survival: a natural language processing study of Reddit posts.癌症幸存者不同阶段的心理健康:一项对Reddit帖子的自然语言处理研究
Front Psychol. 2023 Jun 23;14:1150227. doi: 10.3389/fpsyg.2023.1150227. eCollection 2023.
4
Use of artificial intelligence for cancer clinical trial enrollment: a systematic review and meta-analysis.人工智能在癌症临床试验入组中的应用:系统评价和荟萃分析。
J Natl Cancer Inst. 2023 Apr 11;115(4):365-374. doi: 10.1093/jnci/djad013.