Barone Lindsay, Williams Jason, Micklos David
DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.
PLoS Comput Biol. 2017 Oct 19;13(10):e1005755. doi: 10.1371/journal.pcbi.1005755. eCollection 2017 Oct.
In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.
在2016年对704名美国国家科学基金会(NSF)生物科学理事会首席研究员(BIO PI)进行的一项调查中,近90%的人表示他们目前正在或即将分析大型数据集。BIO PI认为一系列计算需求对他们的工作很重要,包括高性能计算(HPC)、生物信息学支持、多步骤工作流程、更新的分析软件以及存储、共享和发布数据的能力。美国和加拿大此前的研究强调了基础设施需求。然而,BIO PI表示,最紧迫的未满足需求是数据集成、数据管理方面的培训以及针对HPC的扩展分析——他们承认需要数据科学技能来更深入地理解生命。这预示着生物学领域数据知识差距的不断扩大,并向各机构和资助机构提出挑战,要求它们加倍支持生物学方面的计算培训。