Milia Nicola, Congiu Alessandra, Anagnostou Paolo, Montinaro Francesco, Capocasa Marco, Sanna Emanuele, Destro Bisol Giovanni
Dipartimento di Biologia Ambientale, Università di Roma La Sapienza, Roma Italy.
PLoS One. 2012;7(6):e37552. doi: 10.1371/journal.pone.0037552. Epub 2012 Jun 5.
The achievement of a robust, effective and responsible form of data sharing is currently regarded as a priority for biological and bio-medical research. Empirical evaluations of data sharing may be regarded as an indispensable first step in the identification of critical aspects and the development of strategies aimed at increasing availability of research data for the scientific community as a whole. Research concerning human genetic variation represents a potential forerunner in the establishment of widespread sharing of primary datasets. However, no specific analysis has been conducted to date in order to ascertain whether the sharing of primary datasets is common-practice in this research field. To this aim, we analyzed a total of 543 mitochondrial and Y chromosomal datasets reported in 508 papers indexed in the Pubmed database from 2008 to 2011. A substantial portion of datasets (21.9%) was found to have been withheld, while neither strong editorial policies nor high impact factor proved to be effective in increasing the sharing rate beyond the current figure of 80.5%. Disaggregating datasets for research fields, we could observe a substantially lower sharing in medical than evolutionary and forensic genetics, more evident for whole mtDNA sequences (15.0% vs 99.6%). The low rate of positive responses to e-mail requests sent to corresponding authors of withheld datasets (28.6%) suggests that sharing should be regarded as a prerequisite for final paper acceptance, while making authors deposit their results in open online databases which provide data quality control seems to provide the best-practice standard. Finally, we estimated that 29.8% to 32.9% of total resources are used to generate withheld datasets, implying that an important portion of research funding does not produce shared knowledge. By making the scientific community and the public aware of this important aspect, we may help popularize a more effective culture of data sharing.
实现一种强大、有效且负责的数据共享形式目前被视为生物和生物医学研究的优先事项。对数据共享的实证评估可被视为识别关键方面以及制定旨在提高整个科学界研究数据可用性的策略的不可或缺的第一步。关于人类基因变异的研究在建立主要数据集的广泛共享方面具有潜在的引领作用。然而,迄今为止尚未进行具体分析以确定在该研究领域共享主要数据集是否是常见做法。为此,我们分析了2008年至2011年在PubMed数据库中索引的508篇论文中报告的总共543个线粒体和Y染色体数据集。发现相当一部分数据集(21.9%)被扣留,而强有力的编辑政策和高影响因子都未能有效提高共享率,目前的共享率为80.5%。按研究领域对数据集进行分类,我们可以观察到医学领域的共享率明显低于进化遗传学和法医遗传学,对于整个线粒体DNA序列而言更为明显(15.0%对99.6%)。向被扣留数据集的通讯作者发送电子邮件请求得到的积极回复率较低(28.6%),这表明共享应被视为最终论文接受的先决条件,同时让作者将其结果存入提供数据质量控制的开放在线数据库似乎提供了最佳实践标准。最后,我们估计用于生成被扣留数据集的资源占总资源的29.8%至32.9%,这意味着很大一部分研究资金并未产生共享知识。通过让科学界和公众意识到这一重要方面,我们可以帮助推广一种更有效的数据共享文化。