• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

弥合差距:纳入语义相似性度量以有效将 PubMed 查询映射到文档。

Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

J Biomed Inform. 2017 Nov;75:122-127. doi: 10.1016/j.jbi.2017.09.014. Epub 2017 Oct 3.

DOI:10.1016/j.jbi.2017.09.014
PMID:28986328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5687891/
Abstract

The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address the issue, but their performance is not superior compared to common IR approaches. Here we present a query-document similarity measure motivated by the Word Mover's Distance. Unlike other similarity measures, the proposed method relies on neural word embeddings to compute the distance between words. This process helps identify related words when no direct matches are found between a query and a document. Our method is efficient and straightforward to implement. The experimental results on TREC Genomics data show that our approach outperforms the BM25 ranking function by an average of 12% in mean average precision. Furthermore, for a real-world dataset collected from the PubMed search logs, we combine the semantic measure with BM25 using a learning to rank method, which leads to improved ranking scores by up to 25%. This experiment demonstrates that the proposed approach and BM25 nicely complement each other and together produce superior performance.

摘要

传统信息检索 (IR) 的主要方法是检查查询中有多少个单词出现在文档中。然而,这种方法的一个缺点是,它可能无法检测到没有或只有很少查询词的相关文档。已经提出了语义分析方法,例如 LSA(潜在语义分析)和 LDA(潜在狄利克雷分配)来解决这个问题,但它们的性能并不优于常见的 IR 方法。在这里,我们提出了一种基于词移距离的查询-文档相似性度量方法。与其他相似性度量方法不同,所提出的方法依赖于神经词嵌入来计算单词之间的距离。当在查询和文档之间找不到直接匹配时,此过程有助于识别相关单词。我们的方法效率高,易于实现。在 TREC 基因组学数据上的实验结果表明,我们的方法在平均精度方面平均优于 BM25 排序函数 12%。此外,对于从 PubMed 搜索日志中收集的真实数据集,我们使用学习排序方法将语义度量与 BM25 结合使用,这导致排序得分提高了高达 25%。该实验表明,所提出的方法和 BM25 可以很好地互补,共同产生优异的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df9c/5687891/97817e1c7db0/nihms917649f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df9c/5687891/97817e1c7db0/nihms917649f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df9c/5687891/97817e1c7db0/nihms917649f1.jpg

相似文献

1
Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.弥合差距:纳入语义相似性度量以有效将 PubMed 查询映射到文档。
J Biomed Inform. 2017 Nov;75:122-127. doi: 10.1016/j.jbi.2017.09.014. Epub 2017 Oct 3.
2
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.在PubMed查询中发现生物医学语义关系以进行信息检索和数据库管理。
Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw025. Print 2016.
3
A LDA-based approach to promoting ranking diversity for genomics information retrieval.基于 LDA 的方法提高基因组信息检索的排名多样性。
BMC Genomics. 2012 Jun 11;13 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2164-13-S3-S2.
4
Learning to rank query expansion terms for COVID-19 scholarly search.学习对 COVID-19 学术搜索进行查询扩展词的排序。
J Biomed Inform. 2023 Jun;142:104386. doi: 10.1016/j.jbi.2023.104386. Epub 2023 May 12.
5
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
6
In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access.在为美国国立医学图书馆医学主题词表(UMLS)注释的基于PubMed Central开放获取文章的语义相似性度量标准的研究中。
J Biomed Inform. 2015 Oct;57:204-18. doi: 10.1016/j.jbi.2015.07.015. Epub 2015 Aug 1.
7
An improved BM25 algorithm for clinical decision support in Precision Medicine based on co-word analysis and Cuckoo Search.基于共词分析和布谷鸟搜索的精准医学临床决策支持改进的 BM25 算法。
BMC Med Inform Decis Mak. 2021 Mar 2;21(1):81. doi: 10.1186/s12911-021-01454-5.
8
Towards a unified search: Improving PubMed retrieval with full text.迈向统一检索:利用全文提高 PubMed 的检索效果。
J Biomed Inform. 2022 Oct;134:104211. doi: 10.1016/j.jbi.2022.104211. Epub 2022 Sep 21.
9
Understanding the spatial dimension of natural language by measuring the spatial semantic similarity of words through a scalable geospatial context window.通过使用可扩展的地理空间上下文窗口来测量词的空间语义相似性,从而理解自然语言的空间维度。
PLoS One. 2020 Jul 23;15(7):e0236347. doi: 10.1371/journal.pone.0236347. eCollection 2020.
10
A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval.一种用于信息检索的多维度语义伪相关反馈框架。
Sci Rep. 2024 Dec 30;14(1):31806. doi: 10.1038/s41598-024-82871-0.

引用本文的文献

1
Identifying the Question Similarity of Regulatory Documents in the Pharmaceutical Industry by Using the Recognizing Question Entailment System: Evaluation Study.利用识别问题蕴含系统识别制药行业监管文件中的问题相似性:评估研究
JMIR AI. 2023 Sep 26;2:e43483. doi: 10.2196/43483.
2
Better synonyms for enriching biomedical search.更好的生物医学搜索丰富化的同义词。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1894-1902. doi: 10.1093/jamia/ocaa151.
3
Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study.

本文引用的文献

1
Evaluation of Query Expansion Using MeSH in PubMed.在PubMed中使用医学主题词表(MeSH)进行查询扩展的评估
Inf Retr Boston. 2009;12(1):69-80. doi: 10.1007/s10791-008-9074-8.
2
PubMed related articles: a probabilistic topic-based model for content similarity.与PubMed相关的文章:一种基于概率主题的内容相似度模型。
BMC Bioinformatics. 2007 Oct 30;8:423. doi: 10.1186/1471-2105-8-423.
通过对所有医学主题词描述符的四种检索策略进行自动性能评估来确定查询PubMed的最佳语义扩展:比较研究
JMIR Med Inform. 2020 Jun 4;8(6):e12799. doi: 10.2196/12799.
4
Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks.基于混合深度神经网络的新型组合策略的生物医学事件抽取。
BMC Bioinformatics. 2020 Feb 6;21(1):47. doi: 10.1186/s12859-020-3376-2.
5
Best Match: New relevance search for PubMed.最佳匹配:PubMed 的新相关性搜索。
PLoS Biol. 2018 Aug 28;16(8):e2005343. doi: 10.1371/journal.pbio.2005343. eCollection 2018 Aug.
6
PubMed Phrases, an open set of coherent phrases for searching biomedical literature.PubMed 词组,一组用于搜索生物医学文献的开放式连贯词组。
Sci Data. 2018 Jun 12;5:180104. doi: 10.1038/sdata.2018.104.