Hazrin Nur Alyaa, Chong Kai Lun, Huang Yuk Feng, Ahmed Ali Najah, Ng Jing Lin, Koo Chai Hoon, Tan Kok Weng, Sherif Mohsen, El-Shafie Ahmed
Department of Civil Engineering, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Jalan Sg. Long, Bandar Sg. Long, 43000, Kajang, Selangor, Malaysia.
Faculty of Engineering & Quantity Surveying, INTI International University (INTI-IU), Persiaran Perdana BBN, Putra Nilai, Nilai, 71800, Negeri Sembilan, Malaysia.
Heliyon. 2023 Aug 23;9(9):e19426. doi: 10.1016/j.heliyon.2023.e19426. eCollection 2023 Sep.
In consideration of the distinct behavior of machine learning (ML) algorithms, six well-defined ML used were carried out in this study for predicting sea level on a day-to-day basis. Data compiled from 1985 to 2018 was utilized for training and testing the developed models. An assessment of the multiple statistics-driven regression algorithms resulted such that each tested location was associated with a particular preferred model. The following were the developed best models for their respective study areas: In Peninsular Malaysia, the interactions linear regression model was the best at Pulau Langkawi (RMSE = 19.066), the Matern 5/2 gaussian process regression model at Geting (RMSE = 49.891), and the trilayered artificial neural network at Pulau Pinang (RMSE = 20.026), while the linear regression model was the best at Sandakan in Sabah, East Malaysia (RMSE = 14.054). Other metrics, such as MAE and R-square, were also at their best values, each providing its best values, further substantiating the RMSE respectively, at each of the study areas. These empirical statistics (or metrics) also revealed that despite employing sea level as the sole parameter, results obtained were exceptional better when utilizing a 7-day lag, regardless of the model used. Notably, lag variables with less than a 7-day lag could degrade the model's accuracy in representing ground reality. The study emphasizes the importance of thorough training and testing of ML to aid decision-makers in developing mitigation actions for the climate change phenomena of sea level rise through reliable ML.
考虑到机器学习(ML)算法的独特行为,本研究采用了六种定义明确的ML算法来逐日预测海平面。利用1985年至2018年汇编的数据对开发的模型进行训练和测试。对多种统计驱动回归算法的评估结果表明,每个测试地点都与一个特定的首选模型相关联。以下是针对各自研究区域开发的最佳模型:在马来西亚半岛,交互线性回归模型在兰卡威岛表现最佳(均方根误差RMSE = 19.066),马特恩5/2高斯过程回归模型在哥打丁宜表现最佳(RMSE = 49.891),三层人工神经网络在槟城岛表现最佳(RMSE = 20.026),而线性回归模型在东马来西亚沙巴州的山打根表现最佳(RMSE = 14.054)。其他指标,如平均绝对误差(MAE)和决定系数(R平方)也处于最佳值,在每个研究区域分别进一步证实了RMSE的最佳值。这些实证统计数据(或指标)还表明,尽管仅将海平面作为唯一参数,但无论使用何种模型,采用7天滞后时获得的结果都格外出色。值得注意的是,滞后天数少于7天的滞后变量会降低模型反映地面实际情况的准确性。该研究强调了对ML进行全面训练和测试的重要性,以帮助决策者通过可靠的ML制定应对海平面上升气候变化现象的缓解措施。