Chahal Ayushi, Gulia Preeti, Gill Nasib Singh, Yahya Mohammad, Haq Mohd Anul, Aleisa Mohammed, Alenizi Abdullah, Khan Arfat Ahmad, Shukla Piyush Kumar
Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India.
Oakland University, USA.
Heliyon. 2024 Nov 12;10(24):e39275. doi: 10.1016/j.heliyon.2024.e39275. eCollection 2024 Dec 30.
A smart city is deemed smart enough because it has the capability to make decisions on its own. Artificial intelligence needs a lot of data from the physical world to make correct decisions. IoT sensor devices collect data from the surroundings, which is further used for predictive analytics. Collected data may be balanced or imbalanced. Unbalanced data used for decision-making without any pre-processing may lead to ravaging results. This paper proposes a novel predictive analytical technique to manage unbalanced data. A pipeline is designed using Principal Component Analysis (PCA), a hybrid sampling method, and a Machine Learning (ML) prediction method. SMOTE + ENN, a hybrid data balancing method, is used to specify imbalanced data to a balanced state. ML method is applied to form clusters and make predictions over the dataset. A large Smart City IoT dataset having 4,05,184 records has been used in this study. The proposed technique is used to predict the presence of a person in the vicinity of IoT devices. Evaluation parameters such as accuracy, precision, recall, F1-score, and Area Under Curve (AUC)/Receiver Operating Characteristic (ROC) curve are used to evaluate the proposed approach. Accuracy, Precision, Recall, F1-score, and AUC obtained using the proposed technique for cluster 0 are 0.79, 1.0, 0.79, 0.87, and 0.88 and for cluster 1 are 0.86 0.99, 0.86, 0.92, and 0.92, respectively. In view of the encouraging results, the proposed technique may prove to be a good choice to help in decision-making in different application domains in real life.
智慧城市被认为足够智能,因为它有能力自行做出决策。人工智能需要来自物理世界的大量数据才能做出正确决策。物联网传感器设备从周围环境收集数据,这些数据进一步用于预测分析。收集到的数据可能是平衡的,也可能是不平衡的。未经任何预处理就用于决策的不平衡数据可能会导致严重后果。本文提出了一种新颖的预测分析技术来管理不平衡数据。设计了一个使用主成分分析(PCA)、混合采样方法和机器学习(ML)预测方法的管道。使用SMOTE + ENN(一种混合数据平衡方法)将不平衡数据指定为平衡状态。应用ML方法对数据集进行聚类和预测。本研究使用了一个包含405184条记录的大型智慧城市物联网数据集。所提出的技术用于预测物联网设备附近是否有人存在。使用诸如准确率、精确率、召回率、F1分数和曲线下面积(AUC)/接收器操作特征(ROC)曲线等评估参数来评估所提出的方法。使用所提出的技术对聚类0获得的准确率、精确率、召回率、F1分数和AUC分别为0.79、1.0、0.79、0.87和0.88,对聚类1分别为0.86、0.99、0.86、0.92和0.92。鉴于这些令人鼓舞的结果,所提出的技术可能被证明是在现实生活中不同应用领域帮助决策的一个不错选择。