• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于大数据平台的小行星轨道共振基序发现的 SAX 和随机投影算法。

SAX and Random Projection Algorithms for the Motif Discovery of Orbital Asteroid Resonance Using Big Data Platforms.

机构信息

Department of Computer Science Education, Universitas Pendidikan Indonesia, Bandung 40154, Indonesia.

Department of Physics Education, Universitas Pendidikan Indonesia, Bandung 40154, Indonesia.

出版信息

Sensors (Basel). 2022 Jul 6;22(14):5071. doi: 10.3390/s22145071.

DOI:10.3390/s22145071
PMID:35890751
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9322561/
Abstract

The phenomenon of big data has occurred in many fields of knowledge, one of which is astronomy. One example of a large dataset in astronomy is that of numerically integrated time series asteroid orbital elements from a time span of millions to billions of years. For example, the mean motion resonance (MMR) data of an asteroid are used to find out the duration that the asteroid was in a resonance state with a particular planet. For this reason, this research designs a computational model to obtain the mean motion resonance quickly and effectively by modifying and implementing the Symbolic Aggregate Approximation (SAX) algorithm and the motif discovery random projection algorithm on big data platforms (i.e., Apache Hadoop and Apache Spark). There are five following steps on the model: (i) saving data into the Hadoop Distributed File System (HDFS); (ii) importing files to the Resilient Distributed Datasets (RDD); (iii) preprocessing the data; (iv) calculating the motif discovery by executing the User-Defined Function (UDF) program; and (v) gathering the results from the UDF to the HDFS and the .csv file. The results indicated a very significant reduction in computational time between the use of the standalone method and the use of the big data platform. The proposed computational model obtained an average accuracy of 83%, compared with the SwiftVis software.

摘要

大数据现象已经出现在许多知识领域,天文学就是其中之一。天文学中一个大型数据集的例子是数值积分的小行星轨道元素时间序列,时间跨度从数百万年到数十亿年。例如,小行星的平均运动共振(MMR)数据用于找出小行星与特定行星处于共振状态的持续时间。出于这个原因,这项研究设计了一个计算模型,通过修改和在大数据平台(即 Apache Hadoop 和 Apache Spark)上实现符号聚合近似(SAX)算法和主题发现随机投影算法,快速有效地获得平均运动共振。该模型有以下五个步骤:(i)将数据保存到 Hadoop 分布式文件系统(HDFS)中;(ii)将文件导入到弹性分布式数据集(RDD)中;(iii)预处理数据;(iv)通过执行用户定义函数(UDF)程序计算主题发现;以及(v)将 UDF 的结果收集到 HDFS 和.csv 文件中。结果表明,在使用独立方法和大数据平台之间,计算时间有了显著的减少。与 SwiftVis 软件相比,所提出的计算模型的平均准确率达到了 83%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/10267248b902/sensors-22-05071-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/792d3817f8bf/sensors-22-05071-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/25132073e6dd/sensors-22-05071-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/776d4b845070/sensors-22-05071-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/8bb3477f7703/sensors-22-05071-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/d860d074de6b/sensors-22-05071-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/7ca76c75ae4a/sensors-22-05071-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/31ed5c5640e5/sensors-22-05071-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/10267248b902/sensors-22-05071-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/792d3817f8bf/sensors-22-05071-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/25132073e6dd/sensors-22-05071-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/776d4b845070/sensors-22-05071-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/8bb3477f7703/sensors-22-05071-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/d860d074de6b/sensors-22-05071-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/7ca76c75ae4a/sensors-22-05071-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/31ed5c5640e5/sensors-22-05071-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0de0/9322561/10267248b902/sensors-22-05071-g008.jpg

相似文献

1
SAX and Random Projection Algorithms for the Motif Discovery of Orbital Asteroid Resonance Using Big Data Platforms.基于大数据平台的小行星轨道共振基序发现的 SAX 和随机投影算法。
Sensors (Basel). 2022 Jul 6;22(14):5071. doi: 10.3390/s22145071.
2
A distributed computing model for big data anonymization in the networks.一种用于网络大数据匿名化的分布式计算模型。
PLoS One. 2023 Apr 28;18(4):e0285212. doi: 10.1371/journal.pone.0285212. eCollection 2023.
3
PySpark and RDKit: Moving towards Big Data in Cheminformatics.PySpark 和 RDKit:迈向化学生物信息学的大数据时代。
Mol Inform. 2019 Jun;38(6):e1800082. doi: 10.1002/minf.201800082. Epub 2019 Mar 7.
4
Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective.用于保护大数据的基于属性的蜜罐加密算法:从Hadoop分布式文件系统角度看
PeerJ Comput Sci. 2020 Feb 17;6:e259. doi: 10.7717/peerj-cs.259. eCollection 2020.
5
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy.FASTA/Q 数据压缩器在 MapReduce-Hadoop 基因组学中的应用:轻松节省空间和时间。
BMC Bioinformatics. 2021 Mar 22;22(1):144. doi: 10.1186/s12859-021-04063-1.
6
RST-DE: Rough Sets-Based New Differential Evolution Algorithm for Scalable Big Data Feature Selection in Distributed Computing Platforms.基于粗糙集的新差分进化算法在分布式计算平台中的可扩展大数据特征选择。
Big Data. 2022 Aug;10(4):356-367. doi: 10.1089/big.2021.0267. Epub 2022 May 4.
7
Small Files Problem Resolution via Hierarchical Clustering Algorithm.通过层次聚类算法解决小文件问题。
Big Data. 2024;12(3):229-242. doi: 10.1089/big.2022.0181. Epub 2023 May 16.
8
A Novel Oppositional Chaotic Flower Pollination Optimization Algorithm for Automatic Tuning of Hadoop Configuration Parameters.一种新颖的对抗混沌花授粉优化算法,用于自动调整 Hadoop 配置参数。
Big Data. 2020 Jun;8(3):218-234. doi: 10.1089/big.2019.0111. Epub 2020 May 19.
9
An Efficient and Scalable Algorithm to Mine Functional Dependencies from Distributed Big Data.一种从分布式大数据中挖掘函数依赖关系的高效可扩展算法。
Sensors (Basel). 2022 May 19;22(10):3856. doi: 10.3390/s22103856.
10
SecDedoop: Secure Deduplication with Access Control of Big Data in the HDFS/Hadoop Environment.SecDedoop:HDFS/Hadoop 环境中具有大数据访问控制的安全去重。
Big Data. 2020 Apr;8(2):147-163. doi: 10.1089/big.2019.0120.

本文引用的文献

1
Combinatorial approaches to finding subtle signals in DNA sequences.在DNA序列中寻找细微信号的组合方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:269-78.
2
An efficient method for finding repeats in molecular sequences.一种在分子序列中查找重复序列的有效方法。
Nucleic Acids Res. 1983 Jul 11;11(13):4629-34. doi: 10.1093/nar/11.13.4629.