• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PubMed 词组,一组用于搜索生物医学文献的开放式连贯词组。

PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.

出版信息

Sci Data. 2018 Jun 12;5:180104. doi: 10.1038/sdata.2018.104.

DOI:10.1038/sdata.2018.104
PMID:29893755
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5996850/
Abstract

In biomedicine, key concepts are often expressed by multiple words (e.g., 'zinc finger protein'). Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only important for human understanding but also beneficial for automatic information seeking. Here we present a collection of PubMed Phrases that are beneficial for information retrieval and human comprehension. We define these phrases as coherent chunks that are logically connected. To collect the phrase set, we apply the hypergeometric test to detect segments of consecutive terms that are likely to appear together in PubMed. These text segments are then filtered using the BM25 ranking function to ensure that they are beneficial from an information retrieval perspective. Thus, we obtain a set of 705,915 PubMed Phrases. We evaluate the quality of the set by investigating PubMed user click data and manually annotating a sample of 500 randomly selected noun phrases. We also analyze and discuss the usage of these PubMed Phrases in literature search.

摘要

在生物医学领域,关键概念通常由多个词来表达(例如,“锌指蛋白”)。之前的研究表明,将词序列视为有意义的单元(在适用的情况下)不仅对人类理解很重要,而且对自动信息检索也有好处。在这里,我们提供了一套有助于信息检索和人类理解的 PubMed Phrases。我们将这些短语定义为逻辑上相互关联的连贯片段。为了收集短语集,我们应用超几何检验来检测连续术语段,这些术语段很可能在 PubMed 中一起出现。然后,使用 BM25 排名函数对这些文本片段进行过滤,以确保从信息检索的角度来看它们是有益的。因此,我们获得了一套 705915 个 PubMed Phrases。我们通过调查 PubMed 用户点击数据并手动注释 500 个随机选择的名词短语样本来评估该短语集的质量。我们还分析和讨论了这些在文献检索中使用 PubMed Phrases 的情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/81399fdf6ce7/sdata2018104-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/0d895f06f53e/sdata2018104-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/de5a11434e1d/sdata2018104-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/a6bc3d502360/sdata2018104-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/f28a2c5af925/sdata2018104-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/81399fdf6ce7/sdata2018104-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/0d895f06f53e/sdata2018104-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/de5a11434e1d/sdata2018104-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/a6bc3d502360/sdata2018104-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/f28a2c5af925/sdata2018104-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d81a/5996850/81399fdf6ce7/sdata2018104-f5.jpg

相似文献

1
PubMed Phrases, an open set of coherent phrases for searching biomedical literature.PubMed 词组,一组用于搜索生物医学文献的开放式连贯词组。
Sci Data. 2018 Jun 12;5:180104. doi: 10.1038/sdata.2018.104.
2
Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine.自然语言化学文献的术语谱分析:类术语短语检索程序
J Cheminform. 2016 Apr 29;8:22. doi: 10.1186/s13321-016-0136-4. eCollection 2016.
3
PMCVec: Distributed phrase representation for biomedical text processing.PMCVec:用于生物医学文本处理的分布式短语表示
J Biomed Inform. 2019;100S:100047. doi: 10.1016/j.yjbinx.2019.100047. Epub 2019 Jul 20.
4
Extracting noun phrases for all of MEDLINE.提取整个医学文献数据库(MEDLINE)中的名词短语。
Proc AMIA Symp. 1999:671-5.
5
Informative Causality Extraction from Medical Literature via Dependency-Tree-Based Patterns.基于依存树模式从医学文献中提取信息性因果关系
J Healthc Inform Res. 2022 May 25;6(3):295-316. doi: 10.1007/s41666-022-00116-z. eCollection 2022 Sep.
6
Identifying well-formed biomedical phrases in MEDLINE® text.在 MEDLINE® 文本中识别结构良好的生物医学短语。
J Biomed Inform. 2012 Dec;45(6):1035-41. doi: 10.1016/j.jbi.2012.05.005. Epub 2012 Jun 8.
7
Information content in Medline record fields.医学在线数据库(Medline)记录字段中的信息内容。
Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008.
8
Leveraging syntax to better capture the semantics of elliptical coordinated compound noun phrases.利用句法来更好地捕捉省略协调复合名词短语的语义。
J Biomed Inform. 2017 Aug;72:120-131. doi: 10.1016/j.jbi.2017.07.001. Epub 2017 Jul 4.
9
A day in the life of PubMed: analysis of a typical day's query log.《医学期刊数据库(PubMed)一天的使用情况:典型一天的查询日志分析》
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):212-20. doi: 10.1197/jamia.M2191. Epub 2007 Jan 9.
10
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.G-Bean:基于本体图的生物医学文献检索网络工具。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.

引用本文的文献

1
Pinpointing the integration of artificial intelligence in liver cancer immune microenvironment.精准定位人工智能在肝癌免疫微环境中的整合情况。
Front Immunol. 2024 Dec 20;15:1520398. doi: 10.3389/fimmu.2024.1520398. eCollection 2024.
2
PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.PubMed作者指定关键词提取(PubMedAKE)基准
Proc ACM Int Conf Inf Knowl Manag. 2022 Oct;2022:4470-4474. doi: 10.1145/3511808.3557675. Epub 2022 Oct 17.
3
Towards a unified search: Improving PubMed retrieval with full text.

本文引用的文献

1
How to Interpret PubMed Queries and Why It Matters.如何解读PubMed检索以及为何这很重要。
J Am Soc Inf Sci Technol. 2009 Feb;60(2):264-274. doi: 10.1002/asi.20979. Epub 2008 Nov 6.
2
Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.弥合差距:纳入语义相似性度量以有效将 PubMed 查询映射到文档。
J Biomed Inform. 2017 Nov;75:122-127. doi: 10.1016/j.jbi.2017.09.014. Epub 2017 Oct 3.
3
Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.
迈向统一检索:利用全文提高 PubMed 的检索效果。
J Biomed Inform. 2022 Oct;134:104211. doi: 10.1016/j.jbi.2022.104211. Epub 2022 Sep 21.
4
Epione application: An integrated web‑toolkit of clinical genomics and personalized medicine in systemic lupus erythematosus.Epione 应用:系统性红斑狼疮临床基因组学和个性化医学的综合网络工具包。
Int J Mol Med. 2022 Jan;49(1). doi: 10.3892/ijmm.2021.5063. Epub 2021 Nov 18.
5
Demetra Application: An integrated genotype analysis web server for clinical genomics in endometriosis.德梅特拉应用程序:子宫内膜异位症临床基因组学综合基因型分析网络服务器。
Int J Mol Med. 2021 Jun;47(6). doi: 10.3892/ijmm.2021.4948. Epub 2021 Apr 28.
6
Fast searches of large collections of single-cell data using scfind.使用 scfind 快速搜索大型单细胞数据集。
Nat Methods. 2021 Mar;18(3):262-271. doi: 10.1038/s41592-021-01076-9. Epub 2021 Mar 1.
7
A graph-based method for reconstructing entities from coordination ellipsis in medical text.基于图的方法从医学文本中的并列省略中重建实体。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1364-1373. doi: 10.1093/jamia/ocaa109.
8
AI in Health: State of the Art, Challenges, and Future Directions.健康领域的人工智能:现状、挑战与未来方向。
Yearb Med Inform. 2019 Aug;28(1):16-26. doi: 10.1055/s-0039-1677908. Epub 2019 Aug 16.
9
A reference set of curated biomedical data and metadata from clinical case reports.一组经过精心整理的生物医学数据和临床病例报告元数据的参考集。
Sci Data. 2018 Nov 20;5:180258. doi: 10.1038/sdata.2018.258.
可网格化:利用医学主题词表(MeSH)及其衍生主题词搜索PubMed摘要。
Bioinformatics. 2016 Oct 1;32(19):3044-6. doi: 10.1093/bioinformatics/btw331. Epub 2016 Jun 10.
4
Retro: concept-based clustering of biomedical topical sets.回溯:基于概念的生物医学主题集聚类。
Bioinformatics. 2014 Nov 15;30(22):3240-8. doi: 10.1093/bioinformatics/btu514. Epub 2014 Jul 29.
5
Identifying well-formed biomedical phrases in MEDLINE® text.在 MEDLINE® 文本中识别结构良好的生物医学短语。
J Biomed Inform. 2012 Dec;45(6):1035-41. doi: 10.1016/j.jbi.2012.05.005. Epub 2012 Jun 8.
6
Click-words: learning to predict document keywords from a user perspective.点击词:从用户角度学习预测文档关键词。
Bioinformatics. 2010 Nov 1;26(21):2767-75. doi: 10.1093/bioinformatics/btq459. Epub 2010 Sep 1.
7
Understanding PubMed user search behavior through log analysis.通过日志分析了解PubMed用户的搜索行为。
Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27.
8
Relative Effectiveness of Document Titles and Abstracts for Determining Relevance of Documents.标题和摘要对判断文献相关性的相对有效性。
Science. 1961 Oct 6;134(3484):1004-6. doi: 10.1126/science.134.3484.1004.
9
Corpus-based statistical screening for phrase identification.基于语料库的短语识别统计筛选
J Am Med Inform Assoc. 2000 Sep-Oct;7(5):499-511. doi: 10.1136/jamia.2000.0070499.
10
Extracting noun phrases for all of MEDLINE.提取整个医学文献数据库(MEDLINE)中的名词短语。
Proc AMIA Symp. 1999:671-5.