Banzi Rita, Canham Steve, Kuchinke Wolfgang, Krleza-Jeric Karmela, Demotes-Mainard Jacques, Ohmann Christian
Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.
Canham Information Systems, Surrey, UK.
Trials. 2019 Mar 15;20(1):169. doi: 10.1186/s13063-019-3253-3.
Data repositories have the potential to play an important role in the effective and safe sharing of individual-participant data (IPD) from clinical studies. We analysed the current landscape of data repositories to create a detailed description of available repositories and assess their suitability for hosting data from clinical studies, from the perspective of the clinical researcher.
We assessed repositories that enable storage, sharing, discoverability, re-use of the IPD and associated documents from clinical studies using a pre-defined set of 34 items and publicly available information from April to June 2018. For this purpose, we developed an indicator set to capture the maturity of the repositories' procedures and their suitability for the hosting of IPD. The indicators cover guidelines for data upload and data de-identification, data quality controls, contracts for upload and storage, flexibility of access, application of identifiers, availability of metadata, and long-term preservation.
We analysed 25 repositories, from an initial set of 55 identified as possibly relevant. Half of the included repositories were generic, i.e. not limited to a specific disease or clinical area and 13 were launched in the last 8 years. The sample was extremely heterogeneous and included repositories developed by research funders, infrastructures, universities, and editors. All but three repositories do not apply a fee for uploading, storage or access to data. None of the repositories completely demonstrated all the items included in the indicator set, but three repositories (Dryad, Drum, EASY) met - fully or partially - all items. Flexibility of data-access modalities appears to be limited, being lacking in half of the repositories.
Our evaluation, though often hampered by the lack of sufficient information, can help researchers to find a suitable repository for their datasets. Some repositories are more mature because of their support for clinical dataset preparation, contractual agreements, metadata and identifiers, different modalities of access, and long-term preservation of data. Further work is now required to achieve a more robust and accurate system for evaluation, which in turn may encourage the sharing of clinical study data.
Study protocol available at https://zenodo.org/record/1438261#.W64kW9Egrcs .
数据存储库在临床研究中个体参与者数据(IPD)的有效和安全共享方面有发挥重要作用的潜力。我们分析了当前数据存储库的情况,从临床研究人员的角度详细描述可用的存储库,并评估它们托管临床研究数据的适用性。
我们使用一组预先定义的34项内容和2018年4月至6月的公开信息,评估了能够存储、共享、发现、再利用临床研究的IPD及相关文档的存储库。为此,我们制定了一套指标来衡量存储库程序的成熟度及其托管IPD的适用性。这些指标涵盖数据上传和数据去识别化指南、数据质量控制、上传和存储合同、访问灵活性、标识符应用、元数据可用性以及长期保存。
我们从最初确定的55个可能相关的存储库中分析了25个。纳入的存储库中有一半是通用的,即不限于特定疾病或临床领域,且13个是在过去8年推出的。样本极具异质性,包括研究资助者、基础设施、大学和编辑开发的存储库。除了三个存储库外,所有存储库对数据上传、存储或访问均不收费。没有一个存储库完全展示了指标集中包含的所有项目,但有三个存储库(Dryad、Drum、EASY)全部或部分满足了所有项目。数据访问方式的灵活性似乎有限,一半的存储库缺乏这种灵活性。
我们的评估虽然常常因缺乏足够信息而受阻,但可以帮助研究人员为其数据集找到合适的存储库。一些存储库由于对临床数据集准备、合同协议、元数据和标识符、不同访问方式以及数据长期保存的支持而更加成熟。现在需要进一步开展工作以实现一个更强大、准确的评估系统,这反过来可能会鼓励临床研究数据的共享。