Suppr超能文献

预测无全文的实质性生物医学引文。

Predicting substantive biomedical citations without full text.

机构信息

Office of the Director, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782.

Information School, School of Computer, Data, and Information Sciences, College of Letters and Science, University of Wisconsin-Madison, Madison, WI 53706.

出版信息

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2213697120. doi: 10.1073/pnas.2213697120. Epub 2023 Jul 18.

Abstract

Insights from biomedical citation networks can be used to identify promising avenues for accelerating research and its downstream bench-to-bedside translation. Citation analysis generally assumes that each citation documents substantive knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper, we take advantage of late-stage citations added during peer review because these are less likely to represent substantive knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify late-stage citations. The model relies only on the title, abstract, and citations to previous articles but not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall (the vast majority). We find that high prediction scores identify late-stage citations that were likely added during the peer review process as well as those more likely to be rhetorical, such as journal self-citations added during review. Our model conversely gives low prediction scores to early-stage citations and citation classes that are known to represent substantive knowledge transfer. Using this model, we find that US federally funded biomedical research publications represent 30% of the predicted early-stage (and more likely to be substantive) knowledge transfer from basic studies to clinical research, even though these comprise only 10% of the literature. This is a threefold overrepresentation in this important type of knowledge flow.

摘要

生物医学引文网络的洞察可以用来确定有希望的途径,以加速研究及其下游的从实验室到病床的转化。引文分析通常假设每一条引文都记录了实质性的知识转移,这些知识转移为主要实验的构思、设计或执行提供了信息。引文可能还有其他原因。在本文中,我们利用同行评审过程中添加的后期引文,因为这些引文不太可能代表实质性的知识流动。我们利用一个大型的、全面的开放获取数据特征集,训练了一个预测模型来识别后期引文。该模型仅依赖于标题、摘要和对以前文章的引用,而不依赖于全文或未来的引用模式,因此非常适合在出版物发布后,或者在付费墙之后(绝大多数)使用。我们发现,高预测分数可以识别出在同行评审过程中添加的后期引文,以及更可能是修辞性的引文,例如在评审期间添加的期刊自引。我们的模型则相反,对早期引文和已知代表实质性知识转移的引文类别给予较低的预测分数。使用这个模型,我们发现,即使美国联邦资助的生物医学研究出版物仅占文献的 10%,它们也代表了从基础研究到临床研究的预测早期(更有可能是实质性的)知识转移的 30%。在这种重要类型的知识流动中,这是三倍的过度代表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1783/10372685/41fe43c62acc/pnas.2213697120fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验