文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

数据重用与开放数据引文优势。

Data reuse and the open data citation advantage.

机构信息

National Evolutionary Synthesis Center , Durham, NC , USA ; Department of Biology, Duke University , Durham, NC , USA.

出版信息

PeerJ. 2013 Oct 1;1:e175. doi: 10.7717/peerj.175. eCollection 2013.


DOI:10.7717/peerj.175
PMID:24109559
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3792178/
Abstract

Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

摘要

背景:在重新使用已发表数据时,对原始贡献者进行归因不仅是对数据创建者的奖励,也是记录研究结果出处的重要手段。先前的研究发现,与没有可用数据的类似研究相比,具有公开数据集的论文获得的引用数量更多。然而,先前的分析很少有足够的统计能力来控制已知预测引文率的许多变量,这导致对“引文收益”的估计不确定。此外,人们对数据随时间和数据集的重复使用模式知之甚少。方法和结果:在这里,我们在控制许多已知引文预测因素的情况下查看引文率,并研究数据重复使用的可变性。在对 10555 项创建基因表达微阵列数据的研究进行多元回归分析时,我们发现,将数据提供给公共存储库的研究比那些未提供数据的类似研究获得了 9%(95%置信区间:5%至 13%)的引用。纳入了出版日期、期刊影响因子、开放获取状态、作者数量、第一作者和最后作者的出版历史、通讯作者所在国家/地区、机构引用历史以及研究主题作为协变量。引文收益随数据集提交日期而变化:对于 2004 年和 2005 年发表的论文,引文收益最为明显,约为 30%。作者在发布数据集后的两年内使用自己的数据集发表了大多数论文,而由第三方研究人员发表的数据重复使用论文至少在六年内继续积累。为了直接研究数据重复使用的模式,我们通过论文全文中提及 GEO 或 ArrayExpress 访问号,汇编了 9724 个第三方数据重复使用实例。第三方数据使用水平很高:对于 2000 年提交的 100 个数据集,我们估计到 2002 年有 40 篇论文在 PubMed 中重复使用了数据集,到 2004 年有 100 篇,到 2005 年有 150 多篇数据重复使用论文发表。数据重复使用分布在广泛的数据集基础上:一个非常保守的估计发现,在 2003 年至 2007 年间提交的 20%的数据集至少被第三方重复使用过一次。结论:在考虑影响引文率的其他因素后,我们发现公开数据的引文收益是稳健的,尽管比以前报告的要小。我们得出结论,第三方数据重复使用存在直接影响,这种影响持续多年,超出了研究人员发表大部分重复使用自己数据的论文的时间。还考虑了可能对引文收益有贡献的其他因素。我们进一步得出结论,至少对于基因表达微阵列数据,归档的数据集中有相当一部分被重复使用,并且自 2003 年以来,数据集的重复使用强度一直在稳步增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/0f69ba78b47b/peerj-01-175-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/a97f7fbcd114/peerj-01-175-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/ff3d4c6d1f50/peerj-01-175-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/3bda21183925/peerj-01-175-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/17845a260f66/peerj-01-175-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/0d5ae34dc785/peerj-01-175-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/c7ca06fcf53b/peerj-01-175-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/ffcc91b17ae1/peerj-01-175-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/0f69ba78b47b/peerj-01-175-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/a97f7fbcd114/peerj-01-175-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/ff3d4c6d1f50/peerj-01-175-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/3bda21183925/peerj-01-175-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/17845a260f66/peerj-01-175-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/0d5ae34dc785/peerj-01-175-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/c7ca06fcf53b/peerj-01-175-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/ffcc91b17ae1/peerj-01-175-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f86/3792178/0f69ba78b47b/peerj-01-175-g008.jpg

相似文献

[1]
Data reuse and the open data citation advantage.

PeerJ. 2013-10-1

[2]
Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers.

J Biomed Discov Collab. 2010-3-28

[3]
Impact Factors and Prediction of Popular Topics in a Journal.

Ultraschall Med. 2016-8

[4]
Patterns of citations of open access and non-open access conservation biology journal papers and book chapters.

Conserv Biol. 2010-4-23

[5]
[The citation analysis of the publications in Chinese Journal of Preventive Medicine from 2014 to 2017].

Zhonghua Yu Fang Yi Xue Za Zhi. 2020-8-6

[6]
The National Heart, Lung, and Blood Institute data: analyzing published articles that used BioLINCC open access data.

F1000Res. 2020-1-20

[7]
[The characteristics and citation analysis of the publications in during 2016].

Zhonghua Yi Xue Za Zhi. 2020-12-29

[8]
Citation analysis of computer systems papers.

PeerJ Comput Sci. 2023-5-16

[9]
Best Evidence in Emergency Medicine (BEEM) rater scores correlate with publications' future citations.

Acad Emerg Med. 2013-10

[10]
Citation advantage of open access articles.

PLoS Biol. 2006-5

引用本文的文献

[1]
Using Open Science Tools to Teach Environmental Sciences.

Ecol Evol. 2025-7-22

[2]
Open Science Practices in Integrated Assessment Models.

Open Res Eur. 2025-6-16

[3]
Recommendations for sharing network data and materials.

Netw Sci (Camb Univ Press). 2024-12

[4]
Putting health facilities on the map: a renewed call to create geolocated, comprehensive, updated, openly licensed dataset of health facilities in sub-Saharan African countries.

BMC Med. 2025-4-7

[5]
Open science in energy research.

Open Res Eur. 2025-1-22

[6]
FAIR data for optical tweezers experiments.

Biophys J. 2025-4-15

[7]
The academic impact of Open Science: a scoping review.

R Soc Open Sci. 2025-3-5

[8]
Investigating the practices and preferences of health scholars in sharing open research data.

PLoS One. 2025-2-12

[9]
How will we prepare for an uncertain future? The value of open data and code for unborn generations facing climate change.

Proc Biol Sci. 2025-2

[10]
The Venus score for the assessment of the quality and trustworthiness of biomedical datasets.

BioData Min. 2025-1-9

本文引用的文献

[1]
Altmetrics: Value all research products.

Nature. 2013-1-10

[2]
Reuse of public genome-wide gene expression data.

Nat Rev Genet. 2012-12-27

[3]
Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results.

PLoS One. 2011-11-2

[4]
Who shares? Who doesn't? Factors associated with openly archiving raw research data.

PLoS One. 2011-7-13

[5]
Data sharing by scientists: practices and perceptions.

PLoS One. 2011-6-29

[6]
Data archiving is a good investment.

Nature. 2011-5-19

[7]
Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers.

J Biomed Discov Collab. 2010-3-28

[8]
Data archiving.

Evolution. 2010-3-1

[9]
Repeatability of published microarray gene expression analyses.

Nat Genet. 2009-2

[10]
Models for predicting and explaining citation count of biomedical articles.

AMIA Annu Symp Proc. 2008-11-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索