QUEST Center for Responsible Research, Berlin Institute of Health (BIH) at Charité -Universitätsmedizin Berlin, Berlin, Germany.
PLoS One. 2024 May 8;19(5):e0302787. doi: 10.1371/journal.pone.0302787. eCollection 2024.
To monitor the sharing of research data through repositories is increasingly of interest to institutions and funders, as well as from a meta-research perspective. Automated screening tools exist, but they are based on either narrow or vague definitions of open data. Where manual validation has been performed, it was based on a small article sample. At our biomedical research institution, we developed detailed criteria for such a screening, as well as a workflow which combines an automated and a manual step, and considers both fully open and restricted-access data. We use the results for an internal incentivization scheme, as well as for a monitoring in a dashboard. Here, we describe in detail our screening procedure and its validation, based on automated screening of 11035 biomedical research articles, of which 1381 articles with potential data sharing were subsequently screened manually. The screening results were highly reliable, as witnessed by inter-rater reliability values of ≥0.8 (Krippendorff's alpha) in two different validation samples. We also report the results of the screening, both for our institution and an independent sample from a meta-research study. In the largest of the three samples, the 2021 institutional sample, underlying data had been openly shared for 7.8% of research articles. For an additional 1.0% of articles, restricted-access data had been shared, resulting in 8.3% of articles overall having open and/or restricted-access data. The extraction workflow is then discussed with regard to its applicability in different contexts, limitations, possible variations, and future developments. In summary, we present a comprehensive, validated, semi-automated workflow for the detection of shared research data underlying biomedical article publications.
为了监测通过知识库共享研究数据,机构和资助者以及从元研究的角度来看,这越来越受到关注。虽然已经存在自动化筛选工具,但它们是基于对开放数据的狭义或模糊定义。在进行手动验证的地方,它是基于一小部分文章样本。在我们的生物医学研究机构,我们为这种筛选制定了详细的标准,以及一种将自动化和手动步骤相结合的工作流程,同时考虑完全开放和限制访问的数据。我们将结果用于内部激励计划,以及仪表板中的监测。在这里,我们详细描述了我们的筛选程序及其验证,该程序基于对 11035 篇生物医学研究文章的自动化筛选,其中 1381 篇有潜在数据共享的文章随后进行了手动筛选。筛选结果具有高度可靠性,两个不同验证样本中的评分者间可靠性值≥0.8(Krippendorff 的 alpha)证明了这一点。我们还报告了筛选结果,包括我们机构和元研究研究的独立样本。在三个样本中最大的一个,即 2021 年的机构样本中,有 7.8%的研究文章公开共享了基础数据。对于另外 1.0%的文章,限制访问的数据已经被共享,因此总体上有 8.3%的文章有开放和/或限制访问的数据。然后,我们讨论了提取工作流程在不同情况下的适用性、局限性、可能的变化和未来的发展。总之,我们提出了一种全面、经过验证的、半自动的工作流程,用于检测生物医学文章发表背后共享的研究数据。