开源、可重现且廉价的数据挑战和教育基础设施。

Open source and reproducible and inexpensive infrastructure for data challenges and education.

机构信息

Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA.

Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA.

出版信息

Sci Data. 2024 Jan 2;11(1):8. doi: 10.1038/s41597-023-02854-0.

DOI:10.1038/s41597-023-02854-0

PMID:38167901

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10762166/

Abstract

Data sharing is necessary to maximize the actionable knowledge generated from research data. Data challenges can encourage secondary analyses of datasets. Data challenges in biomedicine often rely on advanced cloud-based computing infrastructure and expensive industry partnerships. Examples include challenges that use Google Cloud virtual machines and the Sage Bionetworks Dream Challenges platform. Such robust infrastructures can be financially prohibitive for investigators without substantial resources. Given the potential to develop scientific and clinical knowledge and the NIH emphasis on data sharing and reuse, there is a need for inexpensive and computationally lightweight methods for data sharing and hosting data challenges. To fill that gap, we developed a workflow that allows for reproducible model training, testing, and evaluation. We leveraged public GitHub repositories, open-source computational languages, and Docker technology. In addition, we conducted a data challenge using the infrastructure we developed. In this manuscript, we report on the infrastructure, workflow, and data challenge results. The infrastructure and workflow are likely to be useful for data challenges and education.

摘要

数据共享对于从研究数据中生成可操作的知识至关重要。数据挑战可以鼓励对数据集进行二次分析。生物医学中的数据挑战通常依赖于先进的基于云的计算基础设施和昂贵的行业合作伙伴关系。例如，使用谷歌云虚拟机和 Sage Bionetworks Dream Challenges 平台的挑战。对于没有大量资源的研究人员来说，这种强大的基础设施在财务上可能是不可行的。鉴于开发科学和临床知识的潜力以及 NIH 对数据共享和重用的强调，需要一种廉价且计算量轻的方法来进行数据共享和托管数据挑战。为了填补这一空白，我们开发了一种允许可重复的模型训练、测试和评估的工作流程。我们利用了公共 GitHub 存储库、开源计算语言和 Docker 技术。此外，我们还使用我们开发的基础设施进行了一次数据挑战。在本文中，我们报告了基础设施、工作流程和数据挑战的结果。该基础设施和工作流程可能对数据挑战和教育有用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

开源、可重现且廉价的数据挑战和教育基础设施。

Open source and reproducible and inexpensive infrastructure for data challenges and education.

机构信息

出版信息

相似文献

本文引用的文献

开源、可重现且廉价的数据挑战和教育基础设施。

Open source and reproducible and inexpensive infrastructure for data challenges and education.

机构信息

出版信息

相似文献

本文引用的文献