Wang Jiayi, Wong Raymond K W, Jun Mikyoung, Schumacher Courtney, Saravanan R, Sun Chunmei
Department of Statistics, Texas A&M University.
Department of Mathematics, University of Houston.
Environ Res Commun. 2021 Nov;3(11). doi: 10.1088/2515-7620/ac371f. Epub 2021 Nov 17.
Predicting rain from large-scale environmental variables remains a challenging problem for climate models and it is unclear how well numerical methods can predict the true characteristics of rainfall without smaller (storm) scale information. This study explores the ability of three statistical and machine learning methods to predict 3-hourly rain occurrence and intensity at 0.5° resolution over the tropical Pacific Ocean using rain observations the Global Precipitation Measurement (GPM) satellite radar and large-scale environmental profiles of temperature and moisture from the MERRA-2 reanalysis. We also separated the rain into different types (deep convective, stratiform, and shallow convective) because of their varying kinematic and thermodynamic structures that might respond to the large-scale environment in different ways. Our expectation was that the popular machine learning methods (i.e., the neural network and random forest) would outperform a standard statistical method (a generalized linear model) because of their more flexible structures, especially in predicting the highly skewed distribution of rain rates for each rain type. However, none of the methods obviously distinguish themselves from one another and each method still has issues with predicting rain too often and not fully capturing the high end of the rain rate distributions, both of which are common problems in climate models. One implication of this study is that machine learning tools must be carefully assessed and are not necessarily applicable to solving all big data problems. Another implication is that traditional climate model approaches are not sufficient to predict extreme rain events and that other avenues need to be pursued.
利用大规模环境变量预测降雨,对于气候模型来说仍然是一个具有挑战性的问题,而且尚不清楚数值方法在没有较小(风暴)尺度信息的情况下,能够多准确地预测降雨的真实特征。本研究探讨了三种统计和机器学习方法,利用全球降水测量(GPM)卫星雷达的降雨观测数据以及来自MERRA - 2再分析的温度和湿度的大规模环境剖面,预测热带太平洋上0.5°分辨率下每3小时的降雨发生情况和强度的能力。我们还将降雨分为不同类型(深对流、层状和浅对流),因为它们不同的运动学和热力学结构可能以不同方式对大规模环境做出反应。我们的预期是,流行的机器学习方法(即神经网络和随机森林)将优于标准统计方法(广义线性模型),因为它们的结构更灵活,特别是在预测每种降雨类型的降雨率高度偏态分布方面。然而,没有一种方法明显优于其他方法,并且每种方法在预测降雨过于频繁以及没有完全捕捉到降雨率分布的高端方面仍然存在问题,这两个问题在气候模型中都很常见。这项研究的一个启示是,必须仔细评估机器学习工具,它们不一定适用于解决所有大数据问题。另一个启示是,传统的气候模型方法不足以预测极端降雨事件,需要探索其他途径。