Shuryak Igor
Center for Radiological Research, Columbia University Irving Medical Center, 630 West 168th Street, VC-11-234/5, New York, NY, 10032, USA.
J Environ Radioact. 2022 Jan;241:106772. doi: 10.1016/j.jenvrad.2021.106772. Epub 2021 Nov 9.
Radioactive contamination of terrestrial plants was extensively investigated and quantitatively modeled after the Fukushima nuclear power plant accident. This phenomenon, which is important for ecosystem functioning and protection of human health, is influenced by multiple factors, including plant species, time after the accident, and climate. Machine learning algorithms such as random forests (RF) have a record of strong performance on large multi-dimensional data sets, but, to our knowledge, combined data on post-Fukushima plant contamination with radionuclides were not yet subjected to a machine learning analysis. Here we performed such analysis on two large published data sets: (1) Cs activity concentrations in four common Japanese forest tree species. (2) Plant/soil Cs concentration ratios in multiple perennial plant species. The goal was to show the usefulness of machine learning for identifying and quantifying the main trends of Cs contamination in terrestrial plants. Each data set was split randomly into training and testing parts, RF was fitted and tuned on the training parts, and its performance was assessed on the testing parts by three metrics: coefficient of determination (R), root mean squared error, and mean absolute error. Synthetic noise variables and the Boruta algorithm were used in a customized procedure to identify the most important predictor variables, which consistently outperformed random noise. Good agreement between observations and RF predictions (e.g. R∼0.9 on testing data) was obtained on both data sets. The effects of the most important predictors (e.g. time after the accident, Cs land contamination level, and plant species) and interactions between them were quantified by partial dependence plots. These results of machine learning analyses of large data collections can help to complement previous modeling efforts, and to clarify the patterns of Cs contamination of plants after the Fukushima accident.
福岛核电站事故后,对陆地植物的放射性污染进行了广泛调查并进行了定量建模。这一现象对生态系统功能和人类健康保护至关重要,受到多种因素影响,包括植物种类、事故后的时间以及气候。诸如随机森林(RF)等机器学习算法在大型多维数据集上表现出色,但据我们所知,福岛核事故后植物被放射性核素污染的综合数据尚未进行机器学习分析。在此,我们对两个已发表的大型数据集进行了此类分析:(1)四种常见日本林木物种中的铯活度浓度。(2)多种多年生植物物种中的植物/土壤铯浓度比。目的是展示机器学习在识别和量化陆地植物中铯污染主要趋势方面的有用性。每个数据集随机分为训练和测试部分,在训练部分对随机森林进行拟合和调优,并通过三个指标在测试部分评估其性能:决定系数(R)、均方根误差和平均绝对误差。在一个定制程序中使用合成噪声变量和博鲁塔算法来识别最重要的预测变量,这些变量始终优于随机噪声。在两个数据集上,观测值与随机森林预测值之间都取得了良好的一致性(例如,测试数据上的R约为0.9)。通过偏依赖图量化了最重要预测变量的影响(例如事故后的时间、铯的陆地污染水平和植物物种)以及它们之间的相互作用。这些对大数据集的机器学习分析结果有助于补充先前的建模工作,并阐明福岛事故后植物铯污染的模式。