Suppr超能文献

基于混合采样的预测分析技术在智慧城市中管理不平衡数据。

Predictive analytics technique based on hybrid sampling to manage unbalanced data in smart cities.

作者信息

Chahal Ayushi, Gulia Preeti, Gill Nasib Singh, Yahya Mohammad, Haq Mohd Anul, Aleisa Mohammed, Alenizi Abdullah, Khan Arfat Ahmad, Shukla Piyush Kumar

机构信息

Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India.

Oakland University, USA.

出版信息

Heliyon. 2024 Nov 12;10(24):e39275. doi: 10.1016/j.heliyon.2024.e39275. eCollection 2024 Dec 30.

Abstract

A smart city is deemed smart enough because it has the capability to make decisions on its own. Artificial intelligence needs a lot of data from the physical world to make correct decisions. IoT sensor devices collect data from the surroundings, which is further used for predictive analytics. Collected data may be balanced or imbalanced. Unbalanced data used for decision-making without any pre-processing may lead to ravaging results. This paper proposes a novel predictive analytical technique to manage unbalanced data. A pipeline is designed using Principal Component Analysis (PCA), a hybrid sampling method, and a Machine Learning (ML) prediction method. SMOTE + ENN, a hybrid data balancing method, is used to specify imbalanced data to a balanced state. ML method is applied to form clusters and make predictions over the dataset. A large Smart City IoT dataset having 4,05,184 records has been used in this study. The proposed technique is used to predict the presence of a person in the vicinity of IoT devices. Evaluation parameters such as accuracy, precision, recall, F1-score, and Area Under Curve (AUC)/Receiver Operating Characteristic (ROC) curve are used to evaluate the proposed approach. Accuracy, Precision, Recall, F1-score, and AUC obtained using the proposed technique for cluster 0 are 0.79, 1.0, 0.79, 0.87, and 0.88 and for cluster 1 are 0.86 0.99, 0.86, 0.92, and 0.92, respectively. In view of the encouraging results, the proposed technique may prove to be a good choice to help in decision-making in different application domains in real life.

摘要

智慧城市被认为足够智能,因为它有能力自行做出决策。人工智能需要来自物理世界的大量数据才能做出正确决策。物联网传感器设备从周围环境收集数据,这些数据进一步用于预测分析。收集到的数据可能是平衡的,也可能是不平衡的。未经任何预处理就用于决策的不平衡数据可能会导致严重后果。本文提出了一种新颖的预测分析技术来管理不平衡数据。设计了一个使用主成分分析(PCA)、混合采样方法和机器学习(ML)预测方法的管道。使用SMOTE + ENN(一种混合数据平衡方法)将不平衡数据指定为平衡状态。应用ML方法对数据集进行聚类和预测。本研究使用了一个包含405184条记录的大型智慧城市物联网数据集。所提出的技术用于预测物联网设备附近是否有人存在。使用诸如准确率、精确率、召回率、F1分数和曲线下面积(AUC)/接收器操作特征(ROC)曲线等评估参数来评估所提出的方法。使用所提出的技术对聚类0获得的准确率、精确率、召回率、F1分数和AUC分别为0.79、1.0、0.79、0.87和0.88,对聚类1分别为0.86、0.99、0.86、0.92和0.92。鉴于这些令人鼓舞的结果,所提出的技术可能被证明是在现实生活中不同应用领域帮助决策的一个不错选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dad0/11697540/e6409ab33899/gr1.jpg

相似文献

本文引用的文献

4
Machine Learning in Medicine.医学中的机器学习
N Engl J Med. 2019 Apr 4;380(14):1347-1358. doi: 10.1056/NEJMra1814259.
6
Machine learning for molecular and materials science.机器学习在分子和材料科学中的应用。
Nature. 2018 Jul;559(7715):547-555. doi: 10.1038/s41586-018-0337-2. Epub 2018 Jul 25.
7
Big Data and Machine Learning in Health Care.医疗保健中的大数据与机器学习
JAMA. 2018 Apr 3;319(13):1317-1318. doi: 10.1001/jama.2017.18391.
8
Machine learning for Big Data analytics in plants.植物大数据分析的机器学习。
Trends Plant Sci. 2014 Dec;19(12):798-808. doi: 10.1016/j.tplants.2014.08.004. Epub 2014 Sep 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验