Lavine Barry K, Vora Mehul N
Department of Chemistry, Oklahoma State University, Stillwater, OK 74078-3071, USA.
J Chromatogr A. 2005 Nov 25;1096(1-2):69-75. doi: 10.1016/j.chroma.2005.06.049. Epub 2005 Jul 11.
Gas chromatography and pattern recognition methods were used to develop a potential method for differentiating European honeybees from Africanized honeybees. The test data consisted of 237 gas chromatograms of hydrocarbon extracts obtained from the wax glands, cuticle, and exocrine glands of European and Africanized honeybees. Each gas chromatogram contained 65 peaks corresponding to a set of standardized retention time windows. A genetic algorithm (GA) for pattern recognition was used to identify features in the gas chromatograms characteristic of the genotype. The pattern recognition GA searched for features in the chromatograms that optimized the separation of the European and Africanized honeybees in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the peaks identified by the pattern recognition GA primarily contained information about differences between gas chromatograms of European and Africanized honeybees. The principal component analysis routine embedded in the fitness function of the pattern recognition GA acted as an information filter, significantly reducing the size of the search space since it restricted the search to feature sets whose principal component plots showed clustering on the basis of the bees' genotype. In addition, the algorithm focused on those classes and/or samples that were difficult to classify as it trained using a form of boosting. Samples that consistently classify correctly are not as heavily weighted as samples that are difficult to classify. Over time, the algorithm learns its optimal parameters in a manner similar to a neural network. The pattern recognition GA integrates aspects of artificial intelligence and evolutionary computations to yield a "smart" one-pass procedure for feature selection and classification.
采用气相色谱法和模式识别方法开发了一种区分欧洲蜜蜂和非洲化蜜蜂的潜在方法。测试数据包括从欧洲蜜蜂和非洲化蜜蜂的蜡腺、表皮和外分泌腺获得的237份烃提取物的气相色谱图。每个气相色谱图包含65个峰,对应于一组标准化的保留时间窗口。使用一种用于模式识别的遗传算法(GA)来识别气相色谱图中与基因型相关的特征。模式识别GA在色谱图中搜索特征,以优化在数据的两到三个最大主成分图中欧洲蜜蜂和非洲化蜜蜂的分离。由于最大主成分捕获了数据中的大部分方差,模式识别GA识别出的峰主要包含有关欧洲蜜蜂和非洲化蜜蜂气相色谱图差异的信息。嵌入在模式识别GA适应度函数中的主成分分析程序起到了信息过滤器的作用,显著减小了搜索空间的大小,因为它将搜索限制在主成分图基于蜜蜂基因型显示聚类的特征集上。此外,该算法在训练时采用一种增强形式,专注于那些难以分类的类别和/或样本。始终正确分类的样本的权重不如难以分类的样本重。随着时间的推移,该算法以类似于神经网络的方式学习其最优参数。模式识别GA整合了人工智能和进化计算的各个方面,以产生一种用于特征选择和分类的“智能”单步程序。