• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于云的、支持MapReduce且面向服务的工作流框架实现大型地球科学数据分析。

Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

作者信息

Li Zhenlong, Yang Chaowei, Jin Baoxuan, Yu Manzhu, Liu Kai, Sun Min, Zhan Matthew

机构信息

NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, United States of America.

NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, United States of America; Yunnan Provincial Geomatics Center, Yunnan Bureau of Surveying, Mapping, and GeoInformation, Kunming,Yunnan, China.

出版信息

PLoS One. 2015 Mar 5;10(3):e0116781. doi: 10.1371/journal.pone.0116781. eCollection 2015.

DOI:10.1371/journal.pone.0116781
PMID:25742012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4351198/
Abstract

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.

摘要

地球科学观测和模型模拟正在生成大量的多维数据。有效分析这些数据对于地球科学研究至关重要。然而,这些任务对地球科学家来说具有挑战性,因为处理海量数据在计算和数据方面都要求很高,这是由于数据分析需要复杂的程序和多种工具。为应对这些挑战,提出了一个用于大型地球科学数据分析的科学工作流框架。在这个框架中,通过利用云计算、MapReduce和面向服务的架构(SOA)提出了相关技术。具体来说,采用HBase来跨分布式计算机存储和管理大型地球科学数据。开发了基于MapReduce的算法框架以支持地球科学数据的并行处理。并且构建了面向服务的工作流架构以支持云环境中按需进行的复杂数据分析。一个概念验证原型测试了该框架的性能。结果表明,这个创新框架通过减少数据处理时间以及简化地球科学家的数据分析程序,显著提高了大型地球科学数据分析的效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/492a92e5d9ad/pone.0116781.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/cbe7f7117bbb/pone.0116781.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/c9a3c10a77b2/pone.0116781.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/04c0c2367428/pone.0116781.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/f17a516423ae/pone.0116781.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/db6b51adc630/pone.0116781.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/2fa929dcb68a/pone.0116781.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/9c8bcd366a6d/pone.0116781.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/303b1c6fa94d/pone.0116781.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/6dd84587f051/pone.0116781.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/c542ebde4451/pone.0116781.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/fde51111afd8/pone.0116781.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/5afff479b2a2/pone.0116781.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/492a92e5d9ad/pone.0116781.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/cbe7f7117bbb/pone.0116781.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/c9a3c10a77b2/pone.0116781.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/04c0c2367428/pone.0116781.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/f17a516423ae/pone.0116781.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/db6b51adc630/pone.0116781.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/2fa929dcb68a/pone.0116781.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/9c8bcd366a6d/pone.0116781.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/303b1c6fa94d/pone.0116781.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/6dd84587f051/pone.0116781.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/c542ebde4451/pone.0116781.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/fde51111afd8/pone.0116781.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/5afff479b2a2/pone.0116781.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e81/4351198/492a92e5d9ad/pone.0116781.g013.jpg

相似文献

1
Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.通过基于云的、支持MapReduce且面向服务的工作流框架实现大型地球科学数据分析。
PLoS One. 2015 Mar 5;10(3):e0116781. doi: 10.1371/journal.pone.0116781. eCollection 2015.
2
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。
PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.
3
An Optimized IoT-enabled Big Data Analytics Architecture for Edge-Cloud Computing.一种用于边缘云计算的优化的物联网大数据分析架构。
IEEE Internet Things J. 2023 Mar;10(5):3995-4005. doi: 10.1109/jiot.2022.3157552. Epub 2022 Mar 14.
4
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用:现状与未来趋势。
BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.
5
Closha: bioinformatics workflow system for the analysis of massive sequencing data.Closha:用于大规模测序数据分析的生物信息学工作流系统。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):43. doi: 10.1186/s12859-018-2019-3.
6
TORNADO: Intermediate Results Orchestration based Service-Oriented Data Curation Framework for Intelligent Video Big Data Analytics in the Cloud.TORNADO:用于云端智能视频大数据分析的基于中间结果编排的面向服务的数据管理框架
Sensors (Basel). 2020 Jun 24;20(12):3581. doi: 10.3390/s20123581.
7
Towards an efficient and Energy-Aware mobile big health data architecture.面向高效节能的移动大健康数据架构。
Comput Methods Programs Biomed. 2018 Nov;166:137-154. doi: 10.1016/j.cmpb.2018.10.008. Epub 2018 Oct 4.
8
Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.云端医疗记录大数据访问与处理平台的实现
J Med Syst. 2017 Aug 18;41(10):149. doi: 10.1007/s10916-017-0777-5.
9
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.设计一种用于在云计算环境中推断基因网络的并行进化算法。
BMC Syst Biol. 2014 Jan 16;8:5. doi: 10.1186/1752-0509-8-5.
10
Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies.并行 MapReduce:利用并行执行策略最大化云资源利用率和提升性能。
Biomed Res Int. 2018 Oct 17;2018:7501042. doi: 10.1155/2018/7501042. eCollection 2018.

引用本文的文献

1
Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform.基于并行 MapReduce 云平台的大规模时滞基因调控网络推断。
Sci Rep. 2018 Dec 12;8(1):17787. doi: 10.1038/s41598-018-36180-y.
2
Evolutionary Analyses of Hanwoo (Korean Cattle)-Specific Single-Nucleotide Polymorphisms and Genes Using Whole-Genome Resequencing Data of a Hanwoo Population.利用韩牛群体全基因组重测序数据对韩牛(韩国牛)特异性单核苷酸多态性和基因进行的进化分析。
Mol Cells. 2016 Sep;39(9):692-8. doi: 10.14348/molcells.2016.0148. Epub 2016 Sep 9.
3
Design and Implementation of an Interactive Web-Based Near Real-Time Forest Monitoring System.

本文引用的文献

1
Physicians' perception about electronic medical record system in Makkah Region, Saudi Arabia.沙特阿拉伯麦加地区医生对电子病历系统的看法。
Avicenna J Med. 2015 Jan-Mar;5(1):1-5. doi: 10.4103/2231-0770.148499.
2
The emergence of spatial cyberinfrastructure.空间网络基础设施的出现。
Proc Natl Acad Sci U S A. 2011 Apr 5;108(14):5488-91. doi: 10.1073/pnas.1103051108.
3
Using spatial principles to optimize distributed computing for enabling the physical science discoveries.利用空间原理优化分布式计算,以实现物理科学发现。
基于网络的交互式近实时森林监测系统的设计与实现
PLoS One. 2016 Mar 31;11(3):e0150935. doi: 10.1371/journal.pone.0150935. eCollection 2016.
4
A Geospatial Information Grid Framework for Geological Survey.一种用于地质调查的地理空间信息网格框架。
PLoS One. 2015 Dec 28;10(12):e0145312. doi: 10.1371/journal.pone.0145312. eCollection 2015.
Proc Natl Acad Sci U S A. 2011 Apr 5;108(14):5498-503. doi: 10.1073/pnas.0909315108. Epub 2011 Mar 28.
4
Quantification of modelling uncertainties in a large ensemble of climate change simulations.在大量气候变化模拟集合中对模型不确定性进行量化。
Nature. 2004 Aug 12;430(7001):768-72. doi: 10.1038/nature02771.
5
Taverna: a tool for the composition and enactment of bioinformatics workflows.Taverna:一种用于生物信息学工作流程的组合与执行的工具。
Bioinformatics. 2004 Nov 22;20(17):3045-54. doi: 10.1093/bioinformatics/bth361. Epub 2004 Jun 16.