Suppr超能文献

使用名词短语在 PubMed 上浏览生物医学文献:我们遗漏了多少更新?

Using noun phrases for navigating biomedical literature on Pubmed: how many updates are we losing track of?

机构信息

authorsurvey.com, San Mateo, California, United States of America.

出版信息

PLoS One. 2011;6(9):e24920. doi: 10.1371/journal.pone.0024920. Epub 2011 Sep 14.

Abstract

Author-supplied citations are a fraction of the related literature for a paper. The "related citations" on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed "related citations." We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper--many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper.

摘要

作者提供的引文只是论文相关文献的一小部分。PubMed 上的“相关引文”通常有数十或数百个结果,而且没有提供这些结果相关的原因。我们使用源自论文句子的名词短语,展示了通过可以将论文与其引文相关联的搜索词更透明地导航到 PubMed 更新是有可能的。生成这些搜索词的算法涉及使用自然语言处理工具自动从论文中提取名词短语,并根据在论文中出现的次数与在网络上出现的次数对其进行排名。我们将至少有一个与论文作者提供的引文重叠的搜索查询定义为引文验证 (CV)。当重叠的引文与论文本身的作者相同时,我们将其定义为 CV-S,不同的作者定义为 CV-D。对于 PubMed Central 上的 883 篇论文的系统样本,至少有 86%的论文的搜索词之一是 CV-D,而前 20 个 PubMed“相关引文”是 65%。我们假设这些在 PubMed 上的 2000 万篇论文中计算出的数量在这些百分比的 5%范围内有所不同。平均而言,在所有 883 篇论文中,有 5 个搜索词是 CV-D,有 10 个搜索词是 CV-S,有 6 个独特的引文验证了这些搜索。通过引文验证搜索(无论是 CV-S 还是 CV-D)发现的潜在相关文献数量为每篇论文约 10 篇-如果考虑到其余未进行引文验证的搜索,则更多。每个搜索结果与论文的相关性和关系只能由对该论文有了解或感兴趣的研究人员进行审查和解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b93/3173492/d129a7da1829/pone.0024920.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验