Geophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545;
Geophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545.
Proc Natl Acad Sci U S A. 2021 Feb 2;118(5). doi: 10.1073/pnas.2011362118.
Earthquake prediction, the long-sought holy grail of earthquake science, continues to confound Earth scientists. Could we make advances by crowdsourcing, drawing from the vast knowledge and creativity of the machine learning (ML) community? We used Google's ML competition platform, Kaggle, to engage the worldwide ML community with a competition to develop and improve data analysis approaches on a forecasting problem that uses laboratory earthquake data. The competitors were tasked with predicting the time remaining before the next earthquake of successive laboratory quake events, based on only a small portion of the laboratory seismic data. The more than 4,500 participating teams created and shared more than 400 computer programs in openly accessible notebooks. Complementing the now well-known features of seismic data that map to fault criticality in the laboratory, the winning teams employed unexpected strategies based on rescaling failure times as a fraction of the seismic cycle and comparing input distribution of training and testing data. In addition to yielding scientific insights into fault processes in the laboratory and their relation with the evolution of the statistical properties of the associated seismic data, the competition serves as a pedagogical tool for teaching ML in geophysics. The approach may provide a model for other competitions in geosciences or other domains of study to help engage the ML community on problems of significance.
地震预测是地震科学长期以来追求的圣杯,仍然让地球科学家感到困惑。我们能否通过众包来取得进展,利用机器学习(ML)社区的丰富知识和创造力?我们使用谷歌的 ML 竞赛平台 Kaggle,邀请全球 ML 社区参与一项竞赛,旨在开发和改进使用实验室地震数据的预测问题的数据分析方法。竞赛要求参赛者根据实验室地震数据的一小部分,预测下一次连续实验室地震事件的剩余时间。超过 4500 个参赛团队在公开可访问的笔记本中创建和共享了超过 400 个计算机程序。除了映射到实验室断层临界性的地震数据的著名特征外,获胜团队还采用了基于将故障时间缩放为地震周期分数以及比较训练和测试数据输入分布的意外策略。除了为实验室中的断层过程及其与相关地震数据统计特性演化的关系提供科学见解外,该竞赛还可作为地球物理学中教授 ML 的教学工具。该方法可能为地球科学或其他研究领域的其他竞赛提供模型,以帮助 ML 社区解决重要问题。