Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
Environ Pollut. 2020 Jun;261:114211. doi: 10.1016/j.envpol.2020.114211. Epub 2020 Feb 19.
The relationship between cadmium (Cd) concentration in rice grains and the soil that they are cultivated in is highly uncertain due to the influence of soil properties, rice varieties, and other undetermined factors. In this study, we introduce the probability of exceeding the threshold to characterize this uncertainty and then, build a probabilistic forewarning model. Additionally, a number of associated factors have been used as parameters to improve model performance. Considering that the physicochemical properties and Cd concentration in the soil (Cd) do not follow a normal distribution, and are not independent of each other, a discriminative algorithm, represented by a logistic regression (LR), performed better than generative algorithms, such as the naive Bayes and quadratic discriminant analysis models. The performance of the LR based model was found to be 0.5% better in the case of the univariate model (Cd) and 4.1% better with a multivariate model (soil properties used as additional factors) (p < 0.01). The output of the LR based model predicted probabilities that were positively correlated to the true exceedance rate (R = 0.949,p < 0.01), within an exceedance threshold range of 0.1-0.4 mg kg and a mean deviation of 5.75%. A sensitivity analysis showed that the effect of soil properties on the exceedance probability weakens with an increase in Cd concentration in rice grains. When the threshold is below 0.15 mg kg, soil pH strongly influences the exceedance probability. As the threshold increases, the influence of pH on the exceedance probability is gradually superseded. By quantifying the uncertainty regarding the relationship between Cd concentration in rice grains and soil, the discriminative algorithm-based probabilistic forecasting model offers a new way to assess Cd pollution in rice grown in contaminated paddy fields.
由于土壤性质、水稻品种和其他不确定因素的影响,水稻籽粒中镉(Cd)浓度与种植土壤之间的关系高度不确定。在本研究中,我们引入超过阈值的概率来描述这种不确定性,然后建立概率预警模型。此外,还使用了一些相关因素作为参数来提高模型性能。考虑到土壤(Cd)的理化性质和 Cd 浓度不服从正态分布,且彼此不独立,判别算法(如逻辑回归(LR))的性能优于生成算法(如朴素贝叶斯和二次判别分析模型)。基于 LR 的模型在单变量模型(Cd)的情况下性能提高了 0.5%,在多变量模型(将土壤性质作为附加因素)的情况下性能提高了 4.1%(p<0.01)。LR 基于模型的输出预测概率与真实超标率呈正相关(R=0.949,p<0.01),在超标阈值范围为 0.1-0.4mg kg 和平均偏差为 5.75%内。敏感性分析表明,随着水稻籽粒中 Cd 浓度的增加,土壤性质对超标概率的影响减弱。当阈值低于 0.15mg kg 时,土壤 pH 值强烈影响超标概率。随着阈值的增加,pH 值对超标概率的影响逐渐被取代。通过量化水稻籽粒中 Cd 浓度与土壤之间关系的不确定性,基于判别算法的概率预测模型为评估受污染稻田中水稻的 Cd 污染提供了一种新方法。