Hervás C, Algar J A, Silva M
Department of Computer Science, University of Córdoba, Spain.
J Chem Inf Comput Sci. 2000 May-Jun;40(3):724-31. doi: 10.1021/ci9901284.
The joint use of genetic algorithms and pruning computational neural networks is shown to be an effective means for selecting the number of inputs required to correct temperature variations in kinetic-based determinations. The genetic algorithm uses a pruning procedure based on Bayesian regularization and is highly efficient as a feature selector; it provides quite good results in the generalization process without the need to use a validation set. The fitness function is defined as the sum of two subfunctions: one controls the learning ability of the network and the other its complexity. The training, pruning, and generalization processes were initially tested with simulated data in order to acquire preliminary information for the ensuing work with real data. The performance of the proposed method was assessed by applying it to the determination of the amino acid L-glycine by its classical spectrophotometric reaction with ninhydrin. A straightforward network topology including temperature as input (40+T:2:1 with 19 connections after the pruning process) was used to estimate the L-glycine concentration from kinetic curves affected by temperature variations over the range 60-75 degrees C, using kinetic data acquired up to only 1.5 half-lives. The trained network estimates this concentration with a standard error of prediction for the testing set of ca. 8%, which is much smaller than those provided by a classical parametric method such as nonlinear regression (even if kinetic data acquired at longer half-lives are used). Finally, a kinetic interpretation of the pruning process is provided in order to better demonstrate its potential for kinetic analysis.
遗传算法与剪枝计算神经网络的联合使用被证明是一种有效的方法,用于选择在基于动力学的测定中校正温度变化所需的输入数量。遗传算法采用基于贝叶斯正则化的剪枝程序,作为特征选择器效率很高;它在泛化过程中能提供相当好的结果,而无需使用验证集。适应度函数定义为两个子函数的和:一个控制网络的学习能力,另一个控制其复杂度。训练、剪枝和泛化过程最初用模拟数据进行测试,以便为随后处理实际数据获取初步信息。通过将该方法应用于通过茚三酮经典分光光度反应测定氨基酸L-甘氨酸来评估所提出方法的性能。使用一种简单的网络拓扑结构,将温度作为输入(剪枝后为40 + T:2:1,有19个连接),利用仅在1.5个半衰期内获取的动力学数据,从60 - 75摄氏度范围内受温度变化影响的动力学曲线估计L-甘氨酸浓度。训练后的网络对测试集的预测标准误差约为8%,这比诸如非线性回归等经典参数方法提供的误差要小得多(即使使用在更长半衰期获取的动力学数据)。最后,对剪枝过程进行了动力学解释,以便更好地展示其在动力学分析中的潜力。