Chang Kyeol, Lee Junghye, Jun Chi-Hyuck, Chung Hoeil
Department of Chemistry and Research Institute for Convergence of Basic Sciences, Hanyang University, Haengdang-dong, Seongdong-gu, Seoul 133-791, Republic of Korea.
Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam-Ro, Pohang 790-784, Republic of Korea.
Talanta. 2018 Feb 1;178:348-354. doi: 10.1016/j.talanta.2017.09.039. Epub 2017 Sep 21.
The interleaved Incremental Association Markov Blanket (inter-IAMB) is described herein as a feature selection method for the NIR spectroscopic analysis of several samples (diesel, gasoline, and etchant solutions). Although the Markov blanket (MB) has been proven to be the minimal optimal set of features (variables) that does not change the original target distribution, variables selected by the existing IAMB algorithm could be redundant and/or misleading as the IAMB requires an unnecessarily large amount of learning data to identify the MB. Use of the inter-IAMB interleaving the grow phase with the shrink phase to maintain the size of the MB as small as possible by immediately eliminating invalid candidates could overcome this drawback. In this report, a likelihood-ratio (LR)-based conditional independence test, able to handle spectroscopic data normally comprising a large number of continuous variables in a small number of samples, was uniquely embedded in the inter-IAMB and its utility was evaluated. The variables selected by the inter-IAMB in complexly overlapped and feature-indistinct NIR spectra were used to determine the corresponding sample properties. For comparison, the properties were also determined using the IAMB-selected variables as well as the whole variables. The inter-IAMB was more effective in the selection of variables than the IAMB and thus able to improve the accuracy in the determination of the sample properties, even though a smaller number of variables was used. The proposed LR-embedded inter-IAMB could be a potential feature selection method for vibrational spectroscopic analysis, especially when the obtained spectral features are specificity-deficient and extensively overlapped.
本文将交错增量关联马尔可夫毯(inter-IAMB)描述为一种用于多种样品(柴油、汽油和蚀刻剂溶液)近红外光谱分析的特征选择方法。尽管马尔可夫毯(MB)已被证明是不改变原始目标分布的最小最优特征(变量)集,但现有IAMB算法选择的变量可能是冗余的和/或具有误导性的,因为IAMB需要大量的学习数据来识别MB。使用inter-IAMB将增长阶段与收缩阶段交错,通过立即消除无效候选来尽可能保持MB的规模较小,可以克服这一缺点。在本报告中,一种基于似然比(LR)的条件独立性检验被独特地嵌入到inter-IAMB中,该检验能够处理通常在少量样品中包含大量连续变量的光谱数据,并对其效用进行了评估。inter-IAMB在复杂重叠且特征不明显的近红外光谱中选择的变量用于确定相应的样品性质。为了进行比较,还使用IAMB选择的变量以及全部变量来确定性质。尽管使用的变量数量较少,但inter-IAMB在变量选择方面比IAMB更有效,因此能够提高样品性质测定的准确性。所提出的嵌入LR的inter-IAMB可能是一种潜在的振动光谱分析特征选择方法,特别是当获得的光谱特征缺乏特异性且广泛重叠时。