• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开源、可重现且廉价的数据挑战和教育基础设施。

Open source and reproducible and inexpensive infrastructure for data challenges and education.

机构信息

Department of Biomedical Informatics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA.

Section of Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA.

出版信息

Sci Data. 2024 Jan 2;11(1):8. doi: 10.1038/s41597-023-02854-0.

DOI:10.1038/s41597-023-02854-0
PMID:38167901
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10762166/
Abstract

Data sharing is necessary to maximize the actionable knowledge generated from research data. Data challenges can encourage secondary analyses of datasets. Data challenges in biomedicine often rely on advanced cloud-based computing infrastructure and expensive industry partnerships. Examples include challenges that use Google Cloud virtual machines and the Sage Bionetworks Dream Challenges platform. Such robust infrastructures can be financially prohibitive for investigators without substantial resources. Given the potential to develop scientific and clinical knowledge and the NIH emphasis on data sharing and reuse, there is a need for inexpensive and computationally lightweight methods for data sharing and hosting data challenges. To fill that gap, we developed a workflow that allows for reproducible model training, testing, and evaluation. We leveraged public GitHub repositories, open-source computational languages, and Docker technology. In addition, we conducted a data challenge using the infrastructure we developed. In this manuscript, we report on the infrastructure, workflow, and data challenge results. The infrastructure and workflow are likely to be useful for data challenges and education.

摘要

数据共享对于从研究数据中生成可操作的知识至关重要。数据挑战可以鼓励对数据集进行二次分析。生物医学中的数据挑战通常依赖于先进的基于云的计算基础设施和昂贵的行业合作伙伴关系。例如,使用谷歌云虚拟机和 Sage Bionetworks Dream Challenges 平台的挑战。对于没有大量资源的研究人员来说,这种强大的基础设施在财务上可能是不可行的。鉴于开发科学和临床知识的潜力以及 NIH 对数据共享和重用的强调,需要一种廉价且计算量轻的方法来进行数据共享和托管数据挑战。为了填补这一空白,我们开发了一种允许可重复的模型训练、测试和评估的工作流程。我们利用了公共 GitHub 存储库、开源计算语言和 Docker 技术。此外,我们还使用我们开发的基础设施进行了一次数据挑战。在本文中,我们报告了基础设施、工作流程和数据挑战的结果。该基础设施和工作流程可能对数据挑战和教育有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/04e41753d6a9/41597_2023_2854_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/14b5e71bfedf/41597_2023_2854_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/bae31315b0f3/41597_2023_2854_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/04e41753d6a9/41597_2023_2854_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/14b5e71bfedf/41597_2023_2854_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/bae31315b0f3/41597_2023_2854_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bec/10762166/04e41753d6a9/41597_2023_2854_Fig3_HTML.jpg

相似文献

1
Open source and reproducible and inexpensive infrastructure for data challenges and education.开源、可重现且廉价的数据挑战和教育基础设施。
Sci Data. 2024 Jan 2;11(1):8. doi: 10.1038/s41597-023-02854-0.
2
APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools.APRICOT:通过开放工具在云中实现可重复使用基础设施的高级平台。
Methods Inf Med. 2020 Dec;59(S 02):e33-e45. doi: 10.1055/s-0040-1712460. Epub 2020 Aug 10.
3
PhenoMeNal: processing and analysis of metabolomics data in the cloud.PhenoMeNal:云端代谢组学数据的处理和分析。
Gigascience. 2019 Feb 1;8(2). doi: 10.1093/gigascience/giy149.
4
Neuroscience Cloud Analysis As a Service: An open-source platform for scalable, reproducible data analysis.神经科学云分析即服务:一个用于可扩展、可重复数据分析的开源平台。
Neuron. 2022 Sep 7;110(17):2771-2789.e7. doi: 10.1016/j.neuron.2022.06.018. Epub 2022 Jul 22.
5
CyVerse for Reproducible Research: RNA-Seq Analysis.用于可重复研究的CyVerse:RNA测序分析
Methods Mol Biol. 2022;2443:57-79. doi: 10.1007/978-1-0716-2067-0_3.
6
Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility.用于基准挑战的语义工作流:提高可比性、可重用性和可重复性。
Pac Symp Biocomput. 2019;24:208-219.
7
Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines.可重复的生物信息学项目:一个用于可重复生物信息学分析流程的社区。
BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349. doi: 10.1186/s12859-018-2296-x.
8
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.
9
MoveApps: a serverless no-code analysis platform for animal tracking data.MoveApps:一个用于动物追踪数据的无服务器无代码分析平台。
Mov Ecol. 2022 Jul 18;10(1):30. doi: 10.1186/s40462-022-00327-4.
10
Improving data workflow systems with cloud services and use of open data for bioinformatics research.利用云服务改进数据工作流程系统,并利用开放数据进行生物信息学研究。
Brief Bioinform. 2018 Sep 28;19(5):1035-1050. doi: 10.1093/bib/bbx039.

本文引用的文献

1
Learning Models for Traumatic Brain Injury Mortality Prediction on Pediatric Electronic Health Records.基于儿科电子健康记录的创伤性脑损伤死亡率预测学习模型
Front Neurol. 2022 Jun 10;13:859068. doi: 10.3389/fneur.2022.859068. eCollection 2022.
2
Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020.12 导联心电图分类:PhysioNet/Computing in Cardiology 挑战赛 2020。
Physiol Meas. 2021 Jan 1;41(12):124003. doi: 10.1088/1361-6579/abc960.
3
Development and Prospective Validation of Tools to Accurately Identify Neurosurgical and Critical Care Events in Children With Traumatic Brain Injury.
用于准确识别创伤性脑损伤儿童神经外科和重症监护事件的工具的开发与前瞻性验证
Pediatr Crit Care Med. 2017 May;18(5):442-451. doi: 10.1097/PCC.0000000000001120.
4
Functional Status Scale in Children With Traumatic Brain Injury: A Prospective Cohort Study.创伤性脑损伤患儿的功能状态量表:一项前瞻性队列研究。
Pediatr Crit Care Med. 2016 Dec;17(12):1147-1156. doi: 10.1097/PCC.0000000000000934.
5
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
6
Effect of erythropoietin and transfusion threshold on neurological recovery after traumatic brain injury: a randomized clinical trial.促红细胞生成素和输血阈值对创伤性脑损伤后神经功能恢复的影响:一项随机临床试验。
JAMA. 2014 Jul 2;312(1):36-47. doi: 10.1001/jama.2014.6490.
7
Disability 3, 12, and 24 months after traumatic brain injury among children and adolescents.儿童和青少年创伤性脑损伤后 3、12 和 24 个月的残疾情况。
Pediatrics. 2011 Nov;128(5):e1129-38. doi: 10.1542/peds.2011-0840. Epub 2011 Oct 24.
8
Common data elements for traumatic brain injury: recommendations from the interagency working group on demographics and clinical assessment.创伤性脑损伤的常用数据元素:来自人口统计学和临床评估联合工作组的建议。
Arch Phys Med Rehabil. 2010 Nov;91(11):1641-9. doi: 10.1016/j.apmr.2010.07.232.
9
Position statement: definition of traumatic brain injury.立场声明:创伤性脑损伤的定义。
Arch Phys Med Rehabil. 2010 Nov;91(11):1637-40. doi: 10.1016/j.apmr.2010.05.017.
10
Functional Status Scale: new pediatric outcome measure.功能状态量表:新的儿科结局指标。
Pediatrics. 2009 Jul;124(1):e18-28. doi: 10.1542/peds.2008-1987.