• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A Part-Of-Speech term weighting scheme for biomedical information retrieval.一种用于生物医学信息检索的词性术语加权方案。
J Biomed Inform. 2016 Oct;63:379-389. doi: 10.1016/j.jbi.2016.08.026. Epub 2016 Sep 1.
2
A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.
3
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
4
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
5
A token centric part-of-speech tagger for biomedical text.一种用于生物医学文本的以词元为中心的词性标注器。
Artif Intell Med. 2014 May;61(1):11-20. doi: 10.1016/j.artmed.2014.03.005. Epub 2014 Mar 26.
6
Aligned-Layer Text Search in Clinical Notes.临床笔记中的对齐层文本搜索
Stud Health Technol Inform. 2017;245:629-633.
7
Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols.使用无标题评估协议评估生物医学搜索引擎的客观和自动化协议。
BMC Bioinformatics. 2008 Feb 29;9:132. doi: 10.1186/1471-2105-9-132.
8
Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.通过领域自适应提高临床叙述自然语言处理词性标注的性能。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):931-9. doi: 10.1136/amiajnl-2012-001453. Epub 2013 Mar 13.
9
From POS tagging to dependency parsing for biomedical event extraction.从词性标注到生物医学事件抽取的依存句法分析。
BMC Bioinformatics. 2019 Feb 12;20(1):72. doi: 10.1186/s12859-019-2604-0.
10
Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies.基于术语的生物医学文献标引和检索有效性的影响因素。
Artif Intell Med. 2013 Feb;57(2):155-67. doi: 10.1016/j.artmed.2012.08.006. Epub 2012 Oct 23.

引用本文的文献

1
Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型:文献计量分析综述
J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.
2
Clinical Information Retrieval: A Literature Review.临床信息检索:文献综述
J Healthc Inform Res. 2024 Jan 23;8(2):313-352. doi: 10.1007/s41666-024-00159-4. eCollection 2024 Jun.
3
Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery.利用文本挖掘共现特征为癌症基因panel发现情境化基因
Front Genet. 2021 Oct 25;12:771435. doi: 10.3389/fgene.2021.771435. eCollection 2021.
4
Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.利用词嵌入和医学实体提取,通过非结构化文本检索生物医学数据集。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax091.
5
Lexicon-enhanced sentiment analysis framework using rule-based classification scheme.使用基于规则分类方案的词汇增强情感分析框架。
PLoS One. 2017 Feb 23;12(2):e0171649. doi: 10.1371/journal.pone.0171649. eCollection 2017.

本文引用的文献

1
Care episode retrieval: distributional semantic models for information retrieval in the clinical domain.护理事件检索:临床领域信息检索的分布式语义模型
BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S2. doi: 10.1186/1472-6947-15-S2-S2. Epub 2015 Jun 15.
2
Using large clinical corpora for query expansion in text-based cohort identification.利用大型临床语料库在基于文本的队列识别中进行查询扩展。
J Biomed Inform. 2014 Jun;49:275-81. doi: 10.1016/j.jbi.2014.03.010. Epub 2014 Mar 26.
3
Computer-facilitated review of electronic medical records reliably identifies emergency department interventions in older adults.计算机辅助审查电子病历能可靠地识别老年患者在急诊科的干预措施。
Acad Emerg Med. 2013 Jun;20(6):621-8. doi: 10.1111/acem.12145.
4
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
5
The "meaningful use" regulation for electronic health records.电子健康记录的“有意义使用”规定。
N Engl J Med. 2010 Aug 5;363(6):501-4. doi: 10.1056/NEJMp1006114. Epub 2010 Jul 13.

一种用于生物医学信息检索的词性术语加权方案。

A Part-Of-Speech term weighting scheme for biomedical information retrieval.

作者信息

Wang Yanshan, Wu Stephen, Li Dingcheng, Mehrabi Saeed, Liu Hongfang

机构信息

Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.

Department of Medical Informatics & Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA.

出版信息

J Biomed Inform. 2016 Oct;63:379-389. doi: 10.1016/j.jbi.2016.08.026. Epub 2016 Sep 1.

DOI:10.1016/j.jbi.2016.08.026
PMID:27593166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5493484/
Abstract

In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.

摘要

在数字化时代,信息检索(IR),即根据用户的搜索查询从大量文档集合中检索文档并进行排序,已在生物医学领域得到广泛应用。使用电子健康记录(EHR)构建患者队列以及搜索感兴趣主题的文献是一些信息检索的用例。同时,自然语言处理(NLP),如词法分析或词性(POS)标注,已被开发用于处理临床文档或生物医学文献。我们假设可以将自然语言处理纳入信息检索以强化传统的信息检索模型。在本研究中,我们提出了两种由自然语言处理赋能的信息检索模型,即词性词袋模型(POS-BoW)和词性马尔可夫随机场模型(POS-MRF),它们分别将基于自动词性的词项加权方案纳入词袋(BoW)和马尔可夫随机场(MRF)信息检索模型。在所提出的模型中,基于词性的词项权重通过利用循环坐标法迭代计算,其中沿着每个坐标应用黄金分割线搜索算法来优化由平均准确率(MAP)定义的目标函数。在实证实验中,我们使用了2011年和2012年文本检索会议(TREC)医疗记录赛道以及2004年TREC基因组赛道的数据集。对2011年和2012年医疗记录赛道的评估表明,对于词性词袋模型,与词袋模型相比,信息检索评估指标MAP、bpref和P@10的平均提升率分别为10.88%、4.54%和3.82%;对于词性马尔可夫随机场模型,与马尔可夫随机场模型相比,这些提升率分别为13.59%、8.20%和8.78%。此外,我们通过实验验证了所提出的加权方法优于简单的启发式和基于频率的加权方法,并验证了我们的词性类别选择。使用本实验中计算出的最优权重,我们在2004年TREC基因组赛道上测试了所提出的模型,词性词袋模型和词性马尔可夫随机场模型的平均提升率分别为8.63%和10.04%。这些显著的提升验证了利用词性标注进行生物医学信息检索任务的有效性。