Sun Y Qiang, Hassanzadeh Pedram, Zand Mohsen, Chattopadhyay Ashesh, Weare Jonathan, Abbot Dorian S
Department of the Geophysical Sciences, University of Chicago, Chicago, IL 60637.
Committee on Computational and Applied Mathematics, Division of the Physical Sciences, University of Chicago, Chicago, IL 60637.
Proc Natl Acad Sci U S A. 2025 May 27;122(21):e2420914122. doi: 10.1073/pnas.2420914122. Epub 2025 May 20.
Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather models and long-term climate emulators. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI weather model FourCastNet on the 1979-2015 ERA5 dataset with all data, or with Category 3-5 tropical cyclones (TCs) removed, either globally or only over the North Atlantic or Western Pacific basin. We then test these versions of FourCastNet on 2018-2023 Category 5 TCs (gray swans). All versions yield similar accuracy for global weather, but the one trained without Category 3-5 TCs cannot accurately forecast Category 5 TCs, indicating that these models cannot extrapolate from weaker storms. The versions trained without Category 3-5 TCs in one basin show some skill forecasting Category 5 TCs in that basin, suggesting that FourCastNet can generalize across tropical basins. This is encouraging and surprising because regional information is implicitly encoded in inputs. Given that current state-of-the-art AI weather and climate models have similar learning strategies, we expect our findings to apply to other models. Other types of weather extremes need to be similarly investigated. Our work demonstrates that novel learning strategies are needed for AI models to reliably provide early warning or estimated statistics for the rarest, most impactful TCs, and, possibly, other weather extremes.
预测灰天鹅级别的极端天气是人工智能天气模型和长期气候模拟器面临的一个主要问题,这类极端天气虽有可能出现,但极为罕见,以至于训练数据集中并未包含。一个重要的开放性问题是,人工智能模型能否从训练集中出现的较弱天气事件推断出更强的、未曾见过的极端天气。为了验证这一点,我们在1979 - 2015年ERA5数据集上使用所有数据,或者去除3 - 5级热带气旋(TCs)后,在全球范围内或仅在北大西洋或西太平洋流域训练人工智能天气模型FourCastNet的独立版本。然后,我们在2018 - 2023年的5级热带气旋(灰天鹅)上测试这些FourCastNet版本。所有版本在全球天气预测上的准确率相似,但未包含3 - 5级热带气旋训练的版本无法准确预测5级热带气旋,这表明这些模型无法从较弱的风暴进行推断。在一个流域去除3 - 5级热带气旋训练的版本在该流域对5级热带气旋的预测显示出一定技能,这表明FourCastNet可以在不同热带流域进行推广。这令人鼓舞且出人意料,因为区域信息隐含在输入数据中。鉴于当前最先进的人工智能天气和气候模型具有相似的学习策略,我们预计我们的发现也适用于其他模型。其他类型的极端天气也需要进行类似的研究。我们的工作表明,人工智能模型需要新的学习策略,才能可靠地为最罕见、影响最大的热带气旋以及可能的其他极端天气提供早期预警或估计统计数据。