• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习对 COVID-19 学术搜索进行查询扩展词的排序。

Learning to rank query expansion terms for COVID-19 scholarly search.

机构信息

Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, Canada.

出版信息

J Biomed Inform. 2023 Jun;142:104386. doi: 10.1016/j.jbi.2023.104386. Epub 2023 May 12.

DOI:10.1016/j.jbi.2023.104386
PMID:37178780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10174726/
Abstract

OBJECTIVE

With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need.

MATERIALS AND METHODS

For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms.

RESULTS

The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of RECALL@1000 is 190.85% and in terms of NDCG@1000 is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines.

CONCLUSION

The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR).

摘要

目的

随着 2019 年冠状病毒病(COVID-19)大流行的爆发,公开提供的生物医学信息源数量激增,因此,检索与感兴趣的主题相关的文本成为一个极具挑战性的研究目标。在本文中,我们提出了一种基于临床领域知识(CQED)的上下文查询扩展框架,用于对 PubMed 进行有效搜索,以检索与给定信息需求相关的 COVID-19 学术文章。

材料与方法

为了培训和评估,我们使用了广泛采用的 TREC-COVID 基准。给定一个查询,所提出的框架利用上下文和特定于领域的神经语言模型来生成一组候选查询扩展项,从而丰富原始查询。此外,该框架包括一个多头注意力机制,该机制与学习排名模型一起进行训练,以重新对生成的扩展候选术语列表进行排名。原始查询和排名最高的扩展项被提交给 PubMed 搜索引擎,以检索与信息需求相关的学术文章。CQED 框架可以根据训练和重新对候选扩展项进行排名的学习路径有四种不同的变体。

结果

与原始查询相比,该模型极大地提高了搜索性能。与原始查询相比,在召回率@1000 方面的性能提高了 190.85%,在 NDCG@1000 方面的性能提高了 343.55%。此外,该模型优于所有现有的最先进的基线。在 P@10 方面,基于精度进行优化的模型优于所有基线(0.7987)。另一方面,在 NDCG@10(0.7986)、MAP(0.3450)和 bpref(0.4900)方面,基于所有检索措施的平均值进行优化的 CQED 模型优于所有基线。

结论

与所有现有的基线相比,所提出的模型成功地扩展了提交给 PubMed 的查询,并提高了搜索性能。成功/失败分析表明,该模型提高了评估的每个查询的搜索性能。此外,一项消融研究表明,如果不对生成的候选术语进行排名,整体性能将会下降。未来的工作,我们将探索在进行技术辅助的系统文献综述(SLR)时应用所提出的查询扩展框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/d816480cde49/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/cc93c1141b7a/ga1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/7f0f78b2f7f1/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/7f3c46853ee1/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/98e8b42de13a/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/4dd1d786d77f/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/47a7dae2092b/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/9e7d8a4c734a/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/d816480cde49/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/cc93c1141b7a/ga1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/7f0f78b2f7f1/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/7f3c46853ee1/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/98e8b42de13a/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/4dd1d786d77f/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/47a7dae2092b/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/9e7d8a4c734a/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caed/10174726/d816480cde49/gr7_lrg.jpg

相似文献

1
Learning to rank query expansion terms for COVID-19 scholarly search.学习对 COVID-19 学术搜索进行查询扩展词的排序。
J Biomed Inform. 2023 Jun;142:104386. doi: 10.1016/j.jbi.2023.104386. Epub 2023 May 12.
2
Towards semantic-driven boolean query formalization for biomedical systematic literature reviews.面向生物医学系统文献综述的语义驱动布尔查询形式化
Int J Med Inform. 2023 Feb;170:104928. doi: 10.1016/j.ijmedinf.2022.104928. Epub 2022 Nov 24.
3
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.G-Bean:基于本体图的生物医学文献检索网络工具。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.
4
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.在PubMed查询中发现生物医学语义关系以进行信息检索和数据库管理。
Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw025. Print 2016.
5
Multi-field query expansion is effective for biomedical dataset retrieval.多字段查询扩展对生物医学数据集检索有效。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax062.
6
Bayesian approach to incorporating different types of biomedical knowledge bases into information retrieval systems for clinical decision support in precision medicine.贝叶斯方法在将不同类型的生物医学知识库整合到精准医学临床决策支持信息检索系统中的应用。
J Biomed Inform. 2019 Oct;98:103238. doi: 10.1016/j.jbi.2019.103238. Epub 2019 Jul 10.
7
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.电子健康记录搜索引擎基于语义的查询推荐的开发与以用户为中心的实证评估
J Biomed Inform. 2017 Mar;67:1-10. doi: 10.1016/j.jbi.2017.01.013. Epub 2017 Jan 25.
8
Learning to Refine Expansion Terms for Biomedical Information Retrieval using Semantic Resources.利用语义资源学习优化生物医学信息检索的扩展词项
IEEE/ACM Trans Comput Biol Bioinform. 2018 Feb 2. doi: 10.1109/TCBB.2018.2801303.
9
Improving biomedical information retrieval by linear combinations of different query expansion techniques.通过不同查询扩展技术的线性组合改进生物医学信息检索。
BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):238. doi: 10.1186/s12859-016-1092-8.
10
Automatically finding relevant citations for clinical guideline development.自动查找临床指南制定的相关引用文献。
J Biomed Inform. 2015 Oct;57:436-45. doi: 10.1016/j.jbi.2015.09.003. Epub 2015 Sep 10.

引用本文的文献

1
Semantic approaches for query expansion: taxonomy, challenges, and future research directions.用于查询扩展的语义方法:分类法、挑战及未来研究方向。
PeerJ Comput Sci. 2025 Mar 5;11:e2664. doi: 10.7717/peerj-cs.2664. eCollection 2025.
2
Semantics-enabled biomedical literature analytics.支持语义分析的生物医学文献分析
J Biomed Inform. 2024 Feb;150:104588. doi: 10.1016/j.jbi.2024.104588. Epub 2024 Jan 19.