完美的神经影像学-遗传学-计算风暴：千兆字节的数据、数百万台硬件设备和数千个软件工具的碰撞。

The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools.

机构信息

Laboratory of Neuro Imaging (LONI), David Geffen School of Medicine at UCLA, University of California, Los Angeles, 635 S. Charles Young Drive, Suite 225, Los Angeles, CA, 90095-7334, USA,

出版信息

Brain Imaging Behav. 2014 Jun;8(2):311-22. doi: 10.1007/s11682-013-9248-x.

DOI:10.1007/s11682-013-9248-x

PMID:23975276

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3933453/

Abstract

The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data.

摘要

生物医学数据的数量、多样性和速度呈指数级增长，每年提供数百 petabytes 的新神经影像学和遗传学数据。与此同时，文献中还开发和报告了数以万计的计算算法，以及数千种软件工具和服务。用户需要从数百万台硬件设备中直观、快速、与平台无关地访问数据、软件工具和基础设施。这种信息、科学技术、计算模型和技术进步的爆炸式增长，给数据分析、基于证据的生物医学推断和研究结果的可重复性带来了巨大的挑战。Pipeline 工作流环境为这些异构资源的一致管理提供了基于众包的分布式解决方案。Pipeline 允许多个（本地）客户端和（远程）服务器连接、交换协议、控制执行、监控不同工具或硬件的状态，并以可移植的 XML 工作流形式共享完整的协议。在本文中，我们展示了几个高级计算神经影像学和遗传学案例研究，以及端到端的管道解决方案。这些都是在分析成像（sMRI、fMRI、DTI）、表型（人口统计学、临床）和遗传（SNP）数据的情况下，通过图形工作流协议实现的。

相似文献

The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools.完美的神经影像学-遗传学-计算风暴：千兆字节的数据、数百万台硬件设备和数千个软件工具的碰撞。

Brain Imaging Behav. 2014 Jun;8(2):311-22. doi: 10.1007/s11682-013-9248-x.

Applications of the pipeline environment for visual informatics and genomics computations.管道环境在视觉信息学和基因组计算中的应用。

BMC Bioinformatics. 2011 Jul 26;12:304. doi: 10.1186/1471-2105-12-304.

Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.神经影像学研究设计、计算分析和使用 LONI 管道的数据来源。

PLoS One. 2010 Sep 28;5(9):e13070. doi: 10.1371/journal.pone.0013070.

High-throughput neuroimaging-genetics computational infrastructure.高通量神经影像学-遗传学计算基础设施。

Front Neuroinform. 2014 Apr 23;8:41. doi: 10.3389/fninf.2014.00041. eCollection 2014.

Neuroimaging PheWAS (Phenome-Wide Association Study): A Free Cloud-Computing Platform for Big-Data, Brain-Wide Imaging Association Studies.神经影像学 pheWAS（表型全基因组关联研究）：用于大数据、全脑成像关联研究的免费云计算平台。

Neuroinformatics. 2021 Apr;19(2):285-303. doi: 10.1007/s12021-020-09486-4.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

Efficient, Distributed and Interactive Neuroimaging Data Analysis Using the LONI Pipeline.使用 LONI 管道进行高效、分布式和交互式神经影像学数据分析。

Front Neuroinform. 2009 Jul 20;3:22. doi: 10.3389/neuro.11.022.2009. eCollection 2009.

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.基因组学虚拟实验室：面向云端的实用生物信息学工作台。

PLoS One. 2015 Oct 26;10(10):e0140829. doi: 10.1371/journal.pone.0140829. eCollection 2015.

qPortal: A platform for data-driven biomedical research.qPortal：一个用于数据驱动型生物医学研究的平台。

PLoS One. 2018 Jan 19;13(1):e0191603. doi: 10.1371/journal.pone.0191603. eCollection 2018.

The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols.Dockstore：增强了一个用于共享可重复和可访问的计算协议的社区平台。

Nucleic Acids Res. 2021 Jul 2;49(W1):W624-W632. doi: 10.1093/nar/gkab346.

引用本文的文献

Bridging the Brain and Data Sciences.脑科学与数据科学的融合。

Big Data. 2021 Jun;9(3):153-187. doi: 10.1089/big.2020.0065. Epub 2020 Nov 18.

Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions.理解与检测医疗管理数据中的缺陷：迈向更高数据质量，以更好地支持医疗运营与决策。

J Am Med Inform Assoc. 2020 Mar 1;27(3):386-395. doi: 10.1093/jamia/ocz201.

Role of brain imaging in disorders of brain-gut interaction: a Rome Working Team Report.脑-肠相互作用障碍的脑影像学研究：罗马工作组报告。

Gut. 2019 Sep;68(9):1701-1715. doi: 10.1136/gutjnl-2019-318308. Epub 2019 Jun 7.

Big data sharing and analysis to advance research in post-traumatic epilepsy.大数据共享和分析以推进创伤后癫痫的研究。

Neurobiol Dis. 2019 Mar;123:127-136. doi: 10.1016/j.nbd.2018.05.026. Epub 2018 Jun 1.

Combined Diffusion Tensor and Magnetic Resonance Spectroscopic Imaging Methodology for Automated Regional Brain Analysis: Application in a Normal Pediatric Population.用于自动区域脑分析的联合扩散张量与磁共振波谱成像方法：在正常儿科人群中的应用

Dev Neurosci. 2017;39(5):413-429. doi: 10.1159/000475545. Epub 2017 Jun 27.

Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.预测性大数据分析：一项使用大规模、复杂、异构、不一致、多源和不完整观测数据对帕金森病的研究。

PLoS One. 2016 Aug 5;11(8):e0157077. doi: 10.1371/journal.pone.0157077. eCollection 2016.

Volume and Value of Big Healthcare Data.大型医疗保健数据的体量与价值

J Med Stat Inform. 2016;4. doi: 10.7243/2053-7662-4-3.

Sharing big biomedical data.共享大型生物医学数据。

J Big Data. 2015;2. doi: 10.1186/s40537-015-0016-1. Epub 2015 Jun 27.

SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information.社会经济数据资源整合平台：一个整合了医疗保险、劳动力、人口普查和计量经济学信息的大数据存档库。

J Big Data. 2015;2. doi: 10.1186/s40537-015-0018-z.

The MAPP research network: design, patient characterization and operations.MAPP研究网络：设计、患者特征与运作

BMC Urol. 2014 Aug 1;14:58. doi: 10.1186/1471-2490-14-58.

本文引用的文献

PLoS One. 2013 Sep 5;8(9):e73932. doi: 10.1371/journal.pone.0073932. eCollection 2013.

Genetics of the connectome.连接组学的遗传学。

Neuroimage. 2013 Oct 15;80:475-88. doi: 10.1016/j.neuroimage.2013.05.013. Epub 2013 May 21.

Computational solutions for omics data.计算方法在组学数据中的应用。

Nat Rev Genet. 2013 May;14(5):333-46. doi: 10.1038/nrg3433.

Data sharing and publishing in the field of neuroimaging.神经影像学领域的数据共享与发布。

Gigascience. 2012 Jul 12;1(1):9. doi: 10.1186/2047-217X-1-9.

FKBP5 and attention bias for threat: associations with hippocampal function and shape.FKBP5 与威胁注意偏向：与海马功能和形状的关联。

JAMA Psychiatry. 2013 Apr;70(4):392-400. doi: 10.1001/2013.jamapsychiatry.210.

Probabilistic MRI brain anatomical atlases based on 1,000 Chinese subjects.基于 1000 名中国受试者的概率性 MRI 脑解剖图谱。

PLoS One. 2013;8(1):e50939. doi: 10.1371/journal.pone.0050939. Epub 2013 Jan 2.

The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders.自闭症全基因组关联研究协作组：自闭症谱系障碍的大规模、高通量测序。

Neuron. 2012 Dec 20;76(6):1052-6. doi: 10.1016/j.neuron.2012.12.008.

Next generation sequence analysis and computational genomics using graphical pipeline workflows.下一代序列分析和使用图形管道工作流的计算基因组学。

Genes (Basel). 2012 Aug 30;3(3):545-75. doi: 10.3390/genes3030545.

Making data sharing work: the FCP/INDI experience.实现数据共享：FCP/INDI 的经验。

Neuroimage. 2013 Nov 15;82:683-91. doi: 10.1016/j.neuroimage.2012.10.064. Epub 2012 Oct 30.

A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer.一项基于全基因组测序的研究发现，8q24 上的一个罕见变异与前列腺癌相关。

Nat Genet. 2012 Dec;44(12):1326-9. doi: 10.1038/ng.2437. Epub 2012 Oct 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。