Makkie Milad, Li Xiang, Quinn Shannon, Lin Binbin, Ye Jieping, Mon Geoffrey, Liu Tianming
Department of Computer Science, University of Georgia, Athens, GA 30602.
Clincial Data Science Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114.
IEEE Trans Big Data. 2019 Jun;5(2):109-119. doi: 10.1109/TBDATA.2018.2811508. Epub 2018 Mar 6.
Since the BRAIN Initiative and Human Brain Project began, a few efforts have been made to address the computational challenges of neuroscience Big Data. The promises of these two projects were to model the complex interaction of brain and behavior and to understand and diagnose brain diseases by collecting and analyzing large quanitites of data. Archiving, analyzing, and sharing the growing neuroimaging datasets posed major challenges. New computational methods and technologies have emerged in the domain of Big Data but have not been fully adapted for use in neuroimaging. In this work, we introduce the current challenges of neuroimaging in a big data context. We review our efforts toward creating a data management system to organize the large-scale fMRI datasets, and present our novel algorithms/methods for the distributed fMRI data processing that employs Hadoop and Spark. Finally, we demonstrate the significant performance gains of our algorithms/methods to perform distributed dictionary learning.
自“脑计划”(BRAIN Initiative)和“人类脑计划”(Human Brain Project)启动以来,已经做出了一些努力来应对神经科学大数据的计算挑战。这两个项目的目标是通过收集和分析大量数据来模拟大脑与行为的复杂相互作用,并理解和诊断脑部疾病。存档、分析和共享不断增长的神经影像数据集带来了重大挑战。大数据领域已经出现了新的计算方法和技术,但尚未完全适用于神经影像。在这项工作中,我们介绍了大数据背景下神经影像的当前挑战。我们回顾了我们为创建一个数据管理系统以组织大规模功能磁共振成像(fMRI)数据集所做的努力,并展示了我们用于分布式fMRI数据处理的新颖算法/方法,该方法采用了Hadoop和Spark。最后,我们展示了我们的算法/方法在执行分布式字典学习方面的显著性能提升。