神经猪笼草：一种使用Apache Pig在神经科学应用中处理电生理信号数据的可扩展工具包。

NeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig.

作者信息

Sahoo Satya S, Wei Annan, Valdez Joshua, Wang Li, Zonjy Bilal, Tatsuoka Curtis, Loparo Kenneth A, Lhatoo Samden D

机构信息

Division of Medical Informatics, School of Medicine, Case Western Reserve UniversityCleveland, OH, USA; Electrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve UniversityCleveland, OH, USA.

Electrical Engineering and Computer Science Department, School of Engineering, Case Western Reserve University Cleveland, OH, USA.

出版信息

Front Neuroinform. 2016 Jun 6;10:18. doi: 10.3389/fninf.2016.00018. eCollection 2016.

DOI:10.3389/fninf.2016.00018

PMID:27375472

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4895075/

Abstract

The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks. However, many existing neuroinformatics data processing and analysis tools were not built to manage large volume of data, which makes it difficult for researchers to effectively leverage this available data to advance their research. We introduce a new toolkit called NeuroPigPen that was developed using Apache Hadoop and Pig data flow language to address the challenges posed by large-scale electrophysiological signal data. NeuroPigPen is a modular toolkit that can process large volumes of electrophysiological signal data, such as Electroencephalogram (EEG), Electrocardiogram (ECG), and blood oxygen levels (SpO2), using a new distributed storage model called Cloudwave Signal Format (CSF) that supports easy partitioning and storage of signal data on commodity hardware. NeuroPigPen was developed with three design principles: (a) Scalability-the ability to efficiently process increasing volumes of data; (b) Adaptability-the toolkit can be deployed across different computing configurations; and (c) Ease of programming-the toolkit can be easily used to compose multi-step data processing pipelines using high-level programming constructs. The NeuroPigPen toolkit was evaluated using 750 GB of electrophysiological signal data over a variety of Hadoop cluster configurations ranging from 3 to 30 Data nodes. The evaluation results demonstrate that the toolkit is highly scalable and adaptable, which makes it suitable for use in neuroscience applications as a scalable data processing toolkit. As part of the ongoing extension of NeuroPigPen, we are developing new modules to support statistical functions to analyze signal data for brain connectivity research. In addition, the toolkit is being extended to allow integration with scientific workflow systems. NeuroPigPen is released under BSD license at: https://sites.google.com/a/case.edu/neuropigpen/.

摘要

神经成像和传感技术的最新进展导致神经科学数据在数量、数据生成速率和种类上迅速增加。这种“神经科学大数据”为生物医学研究界提供了一个重要机会，使其能够利用具有更长时间尺度、大量属性且数据量具有统计学意义的数据来设计实验。这些新的数据驱动研究技术的结果可以增进我们对复杂神经疾病的理解，帮助模拟脑损伤的长期影响，并为脑网络动力学提供新的见解。然而，许多现有的神经信息学数据处理和分析工具并非为管理大量数据而构建，这使得研究人员难以有效利用这些可用数据来推进他们的研究。我们引入了一个名为NeuroPigPen的新工具包，它是使用Apache Hadoop和Pig数据流语言开发的，以应对大规模电生理信号数据带来的挑战。NeuroPigPen是一个模块化工具包，它可以使用一种名为Cloudwave信号格式（CSF）的新分布式存储模型来处理大量电生理信号数据，如脑电图（EEG）、心电图（ECG）和血氧水平（SpO2），该模型支持在商用硬件上轻松对信号数据进行分区和存储。NeuroPigPen是基于三个设计原则开发的：（a）可扩展性——有效处理不断增加的数据量的能力；（b）适应性——该工具包可以跨不同的计算配置进行部署；（c）易于编程——该工具包可以轻松地使用高级编程结构来构建多步骤数据处理管道。NeuroPigPen工具包在从3到30个数据节点的各种Hadoop集群配置上使用750GB的电生理信号数据进行了评估。评估结果表明，该工具包具有高度的可扩展性和适应性，这使其适合作为可扩展数据处理工具包用于神经科学应用。作为NeuroPigPen正在进行的扩展的一部分，我们正在开发新的模块以支持用于脑连接性研究的信号数据分析的统计功能。此外，该工具包正在扩展以允许与科学工作流系统集成。NeuroPigPen根据BSD许可发布于：https://sites.google.com/a/case.edu/neuropigpen/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/558a/4895075/cb0de34a5997/fninf-10-00018-g0001.jpg

相似文献

NeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig.神经猪笼草：一种使用Apache Pig在神经科学应用中处理电生理信号数据的可扩展工具包。

Front Neuroinform. 2016 Jun 6;10:18. doi: 10.3389/fninf.2016.00018. eCollection 2016.

A scalable neuroinformatics data flow for electrophysiological signals using MapReduce.一种使用MapReduce的用于电生理信号的可扩展神经信息学数据流。

Front Neuroinform. 2015 Mar 16;9:4. doi: 10.3389/fninf.2015.00004. eCollection 2015.

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用：现状与未来趋势。

BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

VC@Scale: Scalable and high-performance variant calling on cluster environments.VC@Scale：在集群环境中进行可扩展且高性能的变体调用。

Gigascience. 2021 Sep 7;10(9). doi: 10.1093/gigascience/giab057.

Heart beats in the cloud: distributed analysis of electrophysiological 'Big Data' using cloud computing for epilepsy clinical research.心脏在云端跳动：使用云计算对电生理“大数据”进行分布式分析，以用于癫痫临床研究。

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):263-71. doi: 10.1136/amiajnl-2013-002156. Epub 2013 Dec 10.

Cloudwave: distributed processing of "big data" from electrophysiological recordings for epilepsy clinical research using Hadoop.Cloudwave：利用Hadoop对癫痫临床研究中的电生理记录“大数据”进行分布式处理。

AMIA Annu Symp Proc. 2013 Nov 16;2013:691-700. eCollection 2013.

Design and development of a medical big data processing system based on Hadoop.基于Hadoop的医学大数据处理系统的设计与开发。

J Med Syst. 2015 Mar;39(3):23. doi: 10.1007/s10916-015-0220-8. Epub 2015 Feb 10.

MaRe: Processing Big Data with application containers on Apache Spark.MaRe：在 Apache Spark 上使用应用程序容器处理大数据。

Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa042.

PySpark and RDKit: Moving towards Big Data in Cheminformatics.PySpark 和 RDKit：迈向化学生物信息学的大数据时代。

Mol Inform. 2019 Jun;38(6):e1800082. doi: 10.1002/minf.201800082. Epub 2019 Mar 7.

NeuroIntegrative Connectivity (NIC) Informatics Tool for Brain Functional Connectivity Network Analysis in Cohort Studies.神经整合连接（NIC）信息学工具，用于队列研究中的脑功能连接网络分析。

AMIA Annu Symp Proc. 2021 Jan 25;2020:1090-1099. eCollection 2020.

引用本文的文献

Scaling and Benchmarking an Evolutionary Algorithm for Constructing Biophysical Neuronal Models.用于构建生物物理神经元模型的进化算法的规模评估与基准测试

Front Neuroinform. 2022 Jun 17;16:882552. doi: 10.3389/fninf.2022.882552. eCollection 2022.

An Integrative Approach to Study Structural and Functional Network Connectivity in Epilepsy Using Imaging and Signal Data.一种利用成像和信号数据研究癫痫中结构和功能网络连通性的综合方法。

Front Integr Neurosci. 2021 Jan 12;14:491403. doi: 10.3389/fnint.2020.491403. eCollection 2020.

Neurocritical Care: Bench to Bedside (Eds. Claude Hemphill, Michael James) Integrating and Using Big Data in Neurocritical Care.神经危重症医学：从基础到临床（主编 Claude Hemphill、Michael James）神经危重症医学中的大数据整合与应用。

Neurotherapeutics. 2020 Apr;17(2):593-605. doi: 10.1007/s13311-020-00846-1.

本文引用的文献

Data integration: Combined imaging and electrophysiology data in the cloud.数据整合：云端的联合成像与电生理数据

Neuroimage. 2016 Jan 1;124(Pt B):1175-1181. doi: 10.1016/j.neuroimage.2015.05.075. Epub 2015 Jun 2.

A scalable neuroinformatics data flow for electrophysiological signals using MapReduce.一种使用MapReduce的用于电生理信号的可扩展神经信息学数据流。

Front Neuroinform. 2015 Mar 16;9:4. doi: 10.3389/fninf.2015.00004. eCollection 2015.

Computational models of epileptiform activity.癫痫样活动的计算模型。

J Neurosci Methods. 2016 Feb 15;260:233-51. doi: 10.1016/j.jneumeth.2015.03.027. Epub 2015 Apr 3.

Mapping brain activity at scale with cluster computing.大规模脑活动的集群计算映射。

Nat Methods. 2014 Sep;11(9):941-50. doi: 10.1038/nmeth.3041. Epub 2014 Jul 27.

Electrophysiological signal analysis and visualization using Cloudwave for epilepsy clinical research.使用Cloudwave进行癫痫临床研究的电生理信号分析与可视化

Stud Health Technol Inform. 2013;192:817-21.

Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care.癫痫与痫性发作本体论：构建癫痫信息学基础设施以支持临床研究与患者照护

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):82-9. doi: 10.1136/amiajnl-2013-001696. Epub 2013 May 18.

Entering the era of "big data": getting our metrics right.进入“大数据”时代：正确把握我们的指标。

Sleep. 2013 Apr 1;36(4):465-9. doi: 10.5665/sleep.2524.

Modern technology calls for a modern approach to classification of epileptic seizures and the epilepsies.现代技术要求对癫痫发作和癫痫采用现代分类方法。

Epilepsia. 2012 Mar;53(3):405-11. doi: 10.1111/j.1528-1167.2011.03376.x. Epub 2012 Feb 14.

Roles for the pre-supplementary motor area and the right inferior frontal gyrus in stopping action: electrophysiological responses and functional and structural connectivity.预备运动区和右侧下额叶在动作停止中的作用：电生理反应及功能和结构连接。

Neuroimage. 2012 Feb 1;59(3):2860-70. doi: 10.1016/j.neuroimage.2011.09.049. Epub 2011 Sep 29.

Informatics and data mining tools and strategies for the human connectome project.人类连接组计划中的信息学和数据挖掘工具及策略。

Front Neuroinform. 2011 Jun 27;5:4. doi: 10.3389/fninf.2011.00004. eCollection 2011.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经猪笼草：一种使用Apache Pig在神经科学应用中处理电生理信号数据的可扩展工具包。

NeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献