Department of Chemistry, Idaho State University , Pocatello, Idaho 83209, United States.
Anal Chem. 2017 May 2;89(9):5087-5094. doi: 10.1021/acs.analchem.7b00637. Epub 2017 Apr 13.
Sample outlier detection is imperative before calculating a multivariate calibration model. Outliers, especially in high-dimensional space, can be difficult to detect. The outlier measures Hotelling's t-squared, Q-residuals, and Studentized residuals are standard in analytical chemistry with spectroscopic data. However, these and other merits are tuning parameter dependent and sensitive to the outlier themselves, i.e., the measures are susceptible to swamping and masking. Additionally, different samples are also often flagged as outliers depending on the outlier measure used. Sum of ranking differences (SRD) is a new generic fusion tool that can simultaneously evaluate multiple outlier measures across windows of tuning parameter values thereby simplifying outlier detection and providing improved detection. Presented in this paper is SRD to detect multiple outliers despite the effects of masking and swamping. Both spectral (x-outlier) and analyte (y-outlier) outliers can be detected separately or in tandem with SRD using respective merits. Unique to SRD are fusion verification processes to confirm samples flagged as outliers. The SRD process also allows for sample masking checks. Presented, and used by SRD, are several new outlier detection measures. These measures include atypical uses of Procrustes analysis and extended inverted signal correction (EISC). The methodologies are demonstrated on two near-infrared (NIR) data sets.
在计算多元校准模型之前,必须进行样本异常值检测。异常值,尤其是在高维空间中,可能难以检测。在分析化学中,带有光谱数据的异常值测量值包括 Hotelling 的 t 平方、Q 残差和学生化残差,这些都是标准的。然而,这些和其他优点是依赖于调参的,并且对异常值本身很敏感,即这些测量值容易受到淹没和掩盖的影响。此外,不同的样本也经常根据使用的异常值测量值被标记为异常值。总和排序差异(SRD)是一种新的通用融合工具,它可以同时评估多个调参值窗口中的多个异常值测量值,从而简化异常值检测并提供改进的检测。本文提出了使用 SRD 来检测多个异常值,即使存在掩蔽和淹没的影响。使用各自的优点,SRD 可以分别或同时检测光谱(x 异常值)和分析物(y 异常值)异常值。SRD 的独特之处在于融合验证过程,用于确认被标记为异常值的样本。SRD 过程还允许进行样本掩蔽检查。本文提出并使用了几种新的异常值检测措施。这些措施包括使用 Procrustes 分析和扩展的逆信号校正(EISC)的非典型方法。这些方法在两个近红外(NIR)数据集上进行了演示。