German Federal Office for Radiation Protection, Unit Radioecology, Neuherberg, Germany.
German Federal Office for Radiation Protection, Unit NORM and Radon, Berlin, Germany.
J Environ Radioact. 2023 Dec;270:107309. doi: 10.1016/j.jenvrad.2023.107309. Epub 2023 Oct 12.
A German dataset with soil-plant transfer factors for radiocaesium including many co-variables was analysed and prepared for the application of the Random Forest (RF) algorithm using the R libraries 'party', and 'caret'. A RF predictive model for soil-plant transfer factor was created based on 10 co-variables. These are, for example, taxonomic plant family, plant part, soil type and the exchangeable potassium concentration in the soil. The RF model results were compared with the results of two (semi-)mechanistic models. Of the more than 3000 entries in the original dataset, only about 1200 could be used, as this was the largest complete dataset with the largest number of co-variables available. The obtained RF predictive model can reproduce the experimental observations better than the two (semi)-mechanistic models, which are based on many assumptions and fixed parameter values. Model performance was quantified using the metrics of Root Mean Square Error (rmse) and Mean Absolute Error (mae). The RF model was able to reproduce the variability of the data by up to 6 orders of magnitude. The categorical co-predictors, especially taxonomic plant family and plant part, have a greater influence than the numerical co-predictors, such as pH and exchangeable soil potassium concentration. This feasibility study shows that RF is a promising tool to obtain predictive models for transfer factors. However, to build a widely applicable predictive model, a dataset is needed that contains at least thousands of entries for transfer factors and for the most important co-variables and considers a large parameter space.
对包含许多协变量的德国放射性铯土壤-植物转移因子数据集进行了分析和准备,以便使用 R 库 'party' 和 'caret' 应用随机森林 (RF) 算法。基于 10 个协变量创建了土壤-植物转移因子的 RF 预测模型。这些协变量包括植物分类科、植物部位、土壤类型和土壤中可交换钾浓度等。将 RF 模型结果与两个(半)机械模型的结果进行了比较。在原始数据集中的 3000 多个条目,只有约 1200 个可以使用,因为这是具有最大数量的协变量的最大完整数据集。与基于许多假设和固定参数值的两个(半)机械模型相比,获得的 RF 预测模型可以更好地再现实验观测结果。使用均方根误差 (rmse) 和平均绝对误差 (mae) 等指标对模型性能进行了量化。RF 模型能够再现数据的变化,其幅度高达 6 个数量级。分类协变量,尤其是植物分类科和植物部位,比数值协变量(如 pH 值和可交换土壤钾浓度)具有更大的影响。这项可行性研究表明,RF 是获取转移因子预测模型的一种很有前途的工具。然而,要构建一个广泛适用的预测模型,需要一个包含至少数千个转移因子和最重要的协变量条目,并考虑到较大参数空间的数据集。