• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PARSUC:一种基于并行子采样的遥感大数据聚类方法。

PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data.

作者信息

Xia Huiyu, Huang Wei, Li Ning, Zhou Jianzhong, Zhang Dongying

机构信息

Yangtze River Waterway Bureau, Nanjing 210011, China.

School of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China.

出版信息

Sensors (Basel). 2019 Aug 5;19(15):3438. doi: 10.3390/s19153438.

DOI:10.3390/s19153438
PMID:31387335
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6696378/
Abstract

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.

摘要

遥感大数据(RSBD)通常具有数据量巨大、种类多样和维度高的特点。从RSBD中挖掘隐藏信息以用于不同应用面临着重大的计算挑战。聚类是一种重要的数据挖掘技术,广泛应用于遥感影像的处理和分析。然而,传统的聚类算法是为相对较小的数据集设计的。当应用于RSBD问题时,它们通常速度太慢或效率太低,无法实际应用。在本文中,我们提出了一种基于并行子采样的聚类(PARSUC)方法,以在效率和准确性方面提高RSBD聚类的性能。PARSUC利用一种新颖的基于子采样的数据分区(SubDP)方法来实现三步并行聚类,有效解决了现有并行聚类算法显著的性能瓶颈,即它们必须处理大量重复计算才能得到合理的结果。此外,我们提出了一种质心滤波算法(CFA)来消除子采样误差并保证聚类结果的准确性。PARSUC通过使用MapReduce并行模型在Hadoop平台上实现。对不同大小的海量遥感影像进行的实验表明,PARSUC(1)在处理更大的图像数据时比传统的遥感聚类算法提供了更好的准确性;(2)随着添加的计算节点增加,实现了显著的可扩展性;(3)在处理RSBD时比现有的并行聚类算法花费的时间少得多。

相似文献

1
PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data.PARSUC:一种基于并行子采样的遥感大数据聚类方法。
Sensors (Basel). 2019 Aug 5;19(15):3438. doi: 10.3390/s19153438.
2
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用:现状与未来趋势。
BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.
3
Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce.大数据:一种基于MapReduce的并行粒子群优化-反向传播神经网络算法
PLoS One. 2016 Jun 15;11(6):e0157551. doi: 10.1371/journal.pone.0157551. eCollection 2016.
4
A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method.一种使用MapReduce框架和基于语义的聚类方法进行并行查询优化的技术。
PeerJ Comput Sci. 2021 Jun 1;7:e580. doi: 10.7717/peerj-cs.580. eCollection 2021.
5
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.基于 Otsu-Canny 算子的并行图像边缘检测算法在 Hadoop 平台上的实现。
Comput Intell Neurosci. 2018 May 13;2018:3598284. doi: 10.1155/2018/3598284. eCollection 2018.
6
Contribution to Speeding-Up the Solving of Nonlinear Ordinary Differential Equations on Parallel/Multi-Core Platforms for Sensing Systems.对在并行/多核平台上加速求解传感系统的非线性常微分方程的贡献。
Sensors (Basel). 2020 Oct 28;20(21):6130. doi: 10.3390/s20216130.
7
Reduced Time Compression in Big Data Using MapReduce Approach and Hadoop.使用 MapReduce 方法和 Hadoop 减少大数据的时间压缩。
J Med Syst. 2019 Jun 19;43(8):239. doi: 10.1007/s10916-019-1369-3.
8
Research on fast Fourier transforms algorithm of huge remote sensing image technology with GPU and partitioning technology.基于GPU和分区技术的海量遥感图像快速傅里叶变换算法研究
Guang Pu Xue Yu Guang Pu Fen Xi. 2014 Feb;34(2):498-504.
9
Efficient Retrieval of Massive Ocean Remote Sensing Images via a Cloud-Based Mean-Shift Algorithm.基于云的均值漂移算法实现海量海洋遥感图像的高效检索
Sensors (Basel). 2017 Jul 23;17(7):1693. doi: 10.3390/s17071693.
10
A genetic algorithm-based job scheduling model for big data analytics.一种基于遗传算法的大数据分析作业调度模型。
EURASIP J Wirel Commun Netw. 2016;2016:152. doi: 10.1186/s13638-016-0651-z. Epub 2016 Jun 27.

引用本文的文献

1
Computational Intelligence in Remote Sensing: An Editorial.计算智能在遥感中的应用:社论。
Sensors (Basel). 2020 Jan 23;20(3):633. doi: 10.3390/s20030633.

本文引用的文献

1
A Scalable Framework For Cluster Ensembles.一种用于聚类集成的可扩展框架。
Pattern Recognit. 2009 May;42(5):676-688. doi: 10.1016/j.patcog.2008.09.027.