Jagodnik Kathleen M, Koplev Simon, Jenkins Sherry L, Ohno-Machado Lucila, Paten Benedict, Schurer Stephan C, Dumontier Michel, Verborgh Ruben, Bui Alex, Ping Peipei, McKenna Neil J, Madduri Ravi, Pillai Ajay, Ma'ayan Avi
Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA.
Health System Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92083, USA; Health Services Research, San Diego Veterans Administration Health System, San Diego, CA 92083, USA.
J Biomed Inform. 2017 Jul;71:49-57. doi: 10.1016/j.jbi.2017.05.006. Epub 2017 May 10.
The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.
近年来,生物医学研究中的数据量和多样性一直在迅速增长。虽然这些数据在加速发现方面具有巨大潜力,但其使用带来了许多挑战,包括:需要足够的计算基础设施、安全的数据共享和访问流程、能让研究人员查找和整合不同数据集的工具,以及标准化的分析方法。这些只是为支持这些数据的快速积累而需要构建的复杂生态系统的一些要素。美国国立卫生研究院(NIH)的大数据到知识(BD2K)计划旨在促进数字化生物医学研究。在BD2K框架内,共享计划旨在建立一个虚拟环境,以促进用于研究的共享数字对象的使用、互操作性和可发现性。BD2K共享框架试点工作组(CFPWG)的成立是为了明确目标,并开展试点项目,以解决实现BD2K共享愿景方面的现有差距。本报告回顾了BD2K CFPWG为期两天的会议要点,以提供有关推进美国生物医学研究大数据科学的趋势和考量的见解。