Suppr超能文献

识别、报告和减少队列研究的数据管理债务。

Recognizing, reporting and reducing the data curation debt of cohort studies.

机构信息

Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK.

Department of Public Health, Policy and Systems, University of Liverpool, UK.

出版信息

Int J Epidemiol. 2020 Aug 1;49(4):1067-1074. doi: 10.1093/ije/dyaa087.

Abstract

Good data curation is integral to cohort studies, but it is not always done to a level necessary to ensure the longevity of the data a study holds. In this opinion paper, we introduce the concept of data curation debt-the data curation equivalent to the software engineering principle of technical debt. Using the context of UK cohort studies, we define data curation debt-describing examples and their potential impact. We highlight that accruing this debt can make it more difficult to use the data in the future. Additionally, the long-running nature of cohort studies means that interest is accrued on this debt and compounded over time-increasing the impact a debt could have on a study and its stakeholders. Primary causes of data curation debt are discussed across three categories: longevity of hardware, software and data formats; funding; and skills shortages. Based on cross-domain best practice, strategies to reduce the debt and preventive measures are proposed-with importance given to the recognition and transparent reporting of data curation debt. Describing the debt in this way, we encapsulate a multi-faceted issue in simple terms understandable by all cohort study stakeholders. Data curation debt is not only confined to the UK, but is an issue the international community must be aware of and address. This paper aims to stimulate a discussion between cohort studies and their stakeholders on how to address the issue of data curation debt. If data curation debt is left unchecked it could become impossible to use highly valued cohort study data, and ultimately represents an existential risk to studies themselves.

摘要

良好的数据管理对于队列研究至关重要,但并非总是能够达到确保研究数据长期可用的水平。在这篇观点文章中,我们引入了数据管理债务的概念——这相当于软件工程中技术债务的概念。我们使用英国队列研究的背景来定义数据管理债务——描述其示例及其潜在影响。我们强调,积累这种债务可能会使未来更难以使用数据。此外,队列研究的长期性质意味着,这笔债务会随着时间的推移而产生利息并不断累积——增加债务对研究及其利益相关者的影响。我们讨论了数据管理债务的三个主要原因:硬件、软件和数据格式的耐久性;资金;以及技能短缺。基于跨领域的最佳实践,提出了减少债务的策略和预防措施——重要的是要认识到并透明报告数据管理债务。通过这种方式描述债务,我们用所有队列研究利益相关者都能理解的简单术语来概括这个多方面的问题。数据管理债务不仅限于英国,而是国际社会必须意识到并解决的问题。本文旨在引发队列研究及其利益相关者之间的讨论,探讨如何解决数据管理债务问题。如果放任数据管理债务不管,那么高度有价值的队列研究数据可能将无法使用,最终这将对研究本身构成生存风险。

相似文献

1
Recognizing, reporting and reducing the data curation debt of cohort studies.
Int J Epidemiol. 2020 Aug 1;49(4):1067-1074. doi: 10.1093/ije/dyaa087.
2
A framework for community curation of interspecies interactions literature.
Elife. 2023 Jul 4;12:e84658. doi: 10.7554/eLife.84658.
3
The future of Cochrane Neonatal.
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
5
End User Evaluation of the FAIR4Health Data Curation Tool.
Stud Health Technol Inform. 2021 May 27;281:8-12. doi: 10.3233/SHTI210110.
6
Preliminary Planning for Mars Sample Return (MSR) Curation Activities in a Sample Receiving Facility (SRF).
Astrobiology. 2022 Jun;22(S1):S57-S80. doi: 10.1089/AST.2021.0105. Epub 2022 May 19.
7
Accessible data curation and analytics for international-scale citizen science datasets.
Sci Data. 2021 Nov 22;8(1):297. doi: 10.1038/s41597-021-01071-x.
10
Practices of research data curation in institutional repositories: A qualitative view from repository staff.
PLoS One. 2017 Mar 16;12(3):e0173987. doi: 10.1371/journal.pone.0173987. eCollection 2017.

引用本文的文献

1
Data Resource Profile: Melbourne Children's LifeCourse initiative (LifeCourse).
Int J Epidemiol. 2022 Oct 13;51(5):e229-e244. doi: 10.1093/ije/dyac086.
2
What makes administrative data "research-ready"? A systematic review and thematic analysis of published literature.
Int J Popul Data Sci. 2022 Apr 27;7(1):1718. doi: 10.23889/ijpds.v6i1.1718. eCollection 2022.

本文引用的文献

2
Policies and strategies to facilitate secondary use of research data in the health sciences.
Int J Epidemiol. 2017 Dec 1;46(6):1729-1733. doi: 10.1093/ije/dyx195.
3
Large Scale Population Assessment of Physical Activity Using Wrist Worn Accelerometers: The UK Biobank Study.
PLoS One. 2017 Feb 1;12(2):e0169649. doi: 10.1371/journal.pone.0169649. eCollection 2017.
4
1,500 scientists lift the lid on reproducibility.
Nature. 2016 May 26;533(7604):452-4. doi: 10.1038/533452a.
5
The FAIR Guiding Principles for scientific data management and stewardship.
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
6
Data Safe Havens in health research and healthcare.
Bioinformatics. 2015 Oct 15;31(20):3241-8. doi: 10.1093/bioinformatics/btv279. Epub 2015 Jun 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验