Lalor G C, Zhang C
International Centre for Environmental and Nuclear Sciences, University of the West Indies, Kingston, Jamaica.
Sci Total Environ. 2001 Dec 17;281(1-3):99-109. doi: 10.1016/s0048-9697(01)00839-7.
In this study, outliers are classified into three types: (1) range outliers; (2) spatial outliers; and (3) relationship outliers, defined as observations that fall outside of the values expected from correlation within the dataset. The multivariate methods of principal component analysis (PCA), multiple regression analysis (MRA) and an autoassociation neural network (AutoNN) method are applied to a dataset comprising 203 samples of rare earth element (REE) concentrations in soils of Jamaica which shows the expected good correlations between the elements. PCA is shown to be effective in detection of high value range outliers, while AutoNN and MRA are effective in detection of relationship outliers. A backpropagation neural network was used to predict the 'expected values' of the outliers. Four obvious relationship outliers with unexpected low Sm concentrations were selected as an example for remediation. The predicted Sm values were confirmed on remeasurement. Neural network methods, with the advantages of being model-free and effective in solving non-linear relationship problems, appear to provide an automated and effective way for the quality control of environmental databases.
在本研究中,异常值分为三种类型:(1)范围异常值;(2)空间异常值;(3)关系异常值,定义为落在数据集中相关性预期值范围之外的观测值。主成分分析(PCA)、多元回归分析(MRA)和自联想神经网络(AutoNN)方法等多元方法应用于包含牙买加土壤中203个稀土元素(REE)浓度样本的数据集,该数据集显示了元素之间预期的良好相关性。结果表明,PCA在检测高值范围异常值方面有效,而AutoNN和MRA在检测关系异常值方面有效。使用反向传播神经网络预测异常值的“预期值”。选择了四个具有意外低钐浓度的明显关系异常值作为修复示例。重新测量时确认了预测的钐值。神经网络方法具有无需模型且有效解决非线性关系问题的优点,似乎为环境数据库的质量控制提供了一种自动化且有效的方法。