Brykov Michail Nikolaevich, Petryshynets Ivan, Pruncu Catalin Iulian, Efremenko Vasily Georgievich, Pimenov Danil Yurievich, Giasin Khaled, Sylenko Serhii Anatolievich, Wojciechowski Szymon
Zaporizhzhia Polytechnic National University, Zaporizhzhia, 69063 Ukraine.
Institute of Materials Research, Slovak Academy of Sciences, Kosice, 04001 Slovak.
Sensors (Basel). 2020 Jul 29;20(15):4228. doi: 10.3390/s20154228.
This article aims to discusses machine learning modelling using a dataset provided by the LANL (Los Alamos National Laboratory) earthquake prediction competition hosted by Kaggle. The data were obtained from a laboratory stick-slip friction experiment that mimics real earthquakes. Digitized acoustic signals were recorded against time to failure of a granular layer compressed between steel plates. In this work, machine learning was employed to develop models that could predict earthquakes. The aim is to highlight the importance and potential applicability of machine learning in seismology The XGBoost algorithm was used for modelling combined with 6-fold cross-validation and the mean absolute error (MAE) metric for model quality estimation. The backward feature elimination technique was used followed by the forward feature construction approach to find the best combination of features. The advantage of this feature engineering method is that it enables the best subset to be found from a relatively large set of features in a relatively short time. It was confirmed that the proper combination of statistical characteristics describing acoustic data can be used for effective prediction of time to failure. Additionally, statistical features based on the autocorrelation of acoustic data can also be used for further improvement of model quality. A total of 48 statistical features were considered. The best subset was determined as having 10 features. Its corresponding MAE was 1.913 s, which was stable to the third decimal point. The presented results can be used to develop artificial intelligence algorithms devoted to earthquake prediction.
本文旨在讨论使用由Kaggle主办的洛斯阿拉莫斯国家实验室(LANL)地震预测竞赛提供的数据集进行机器学习建模。数据来自一个模拟真实地震的实验室粘滑摩擦实验。记录了数字化声学信号随钢板间压缩颗粒层失效时间的变化。在这项工作中,采用机器学习来开发能够预测地震的模型。目的是突出机器学习在地震学中的重要性和潜在适用性。使用XGBoost算法进行建模,并结合6折交叉验证和平均绝对误差(MAE)指标来评估模型质量。采用后向特征消除技术,随后采用前向特征构建方法来找到最佳特征组合。这种特征工程方法的优点是能够在相对短的时间内从相对大量的特征中找到最佳子集。结果证实,描述声学数据的统计特征的适当组合可用于有效预测失效时间。此外,基于声学数据自相关的统计特征也可用于进一步提高模型质量。总共考虑了48个统计特征。确定最佳子集有10个特征。其相应的MAE为1.913秒,精确到小数点后第三位都很稳定。所呈现的结果可用于开发致力于地震预测的人工智能算法。