Department of Medical Oncology, The First Hospital of China Medical University, No. 210, Baita Street, Hunnan District, Shenyang 110001, China.
Shenyang Medical College, Huanghe North Street 146, Shenyang 110034, China.
Anal Methods. 2023 Mar 9;15(10):1286-1296. doi: 10.1039/d2ay02072f.
In most of the near-infrared studies, near-infrared spectra (NIRS) were often mathematically treated. However, these algorithms selected a large number of variables and latent variables, and they caused the over-fitting phenomenon, which became very common. The large number of variables made it impossible to extract the "chemical information" directly from the NIRS. To build robust and interpretable mathematical models, the non-dominated sorting genetic-II-competitive adaptive reweighted sampling (NSGAII-CARS) algorithm was proposed to determine influential functional groups for quantitative analysis. In this research, data on a primary mixture of two amino acids (AAs), namely NH(CH)COOH and HOOC(NH)CH(CH)COOH, was used to illustrate the algorithm. The principle of the algorithm was first to find out the different characteristic spectral regions of two amino acids by extreme points according to Non-dominated Sorting Genetic-II (NSGAII). Second, based on the absolute value of the regression coefficient, we found out [(CH) + 2(CH)] and [2(CH)], where the wavenumber ranged from 6165 to 5683 cm, were the influential functional groups for quantitative analysis. Finally, the CARS (competitive adaptive reweighted sampling) algorithm was combined with NSGAII to find the specific fingerprint points for the determination of two AAs. Compared with the previous results, the NSGAII-CARS algorithm not only pointed out the influential quantitative functional groups but also used only 6 points for HOOC(NH)CH(CH)COOH and 18 points for NH(CH)COOH to achieve the full-spectrum quantitative effect. The results proposed a general algorithm for the quantitative analysis of NIRS obtained in the binary or ternary mixed systems. The MATLAB codes of the NSGAII-CARS algorithm are available on the website: https://github.com/Mark1988NK/NSGAII-CARS-Algorithm.git.
在大多数近红外研究中,近红外光谱(NIRS)经常进行数学处理。然而,这些算法选择了大量的变量和潜在变量,导致了过拟合现象,这变得非常普遍。大量的变量使得不可能直接从 NIRS 中提取“化学信息”。为了建立稳健且可解释的数学模型,提出了非支配排序遗传 II 竞争自适应重加权采样(NSGAII-CARS)算法来确定用于定量分析的有影响的功能组。在这项研究中,使用两种氨基酸(AA)的主要混合物的数据,即 NH(CH)COOH 和 HOOC(NH)CH(CH)COOH,来说明该算法。该算法的原理首先是根据非支配排序遗传 II(NSGAII)通过极值找出两种氨基酸的不同特征光谱区域。其次,基于回归系数的绝对值,找出对定量分析有影响的功能组[(CH)+2(CH)]和[2(CH)],波数范围从 6165 到 5683 cm。最后,将 CARS(竞争自适应重加权采样)算法与 NSGAII 相结合,找到用于确定两种 AA 的特定指纹点。与以前的结果相比,NSGAII-CARS 算法不仅指出了有影响的定量功能组,而且仅使用 6 个点用于 HOOC(NH)CH(CH)COOH 和 18 个点用于 NH(CH)COOH 即可实现全谱定量效果。结果提出了一种用于二元或三元混合系统中获得的 NIRS 的定量分析的通用算法。NSGAII-CARS 算法的 MATLAB 代码可在以下网站获得:https://github.com/Mark1988NK/NSGAII-CARS-Algorithm.git。