Hall L Mark, Hill Dennis W, Menikarachchi Lochana C, Chen Ming-Hui, Hall Lowell H, Grant David F
Hall Associates Consulting, Quincy, MA, USA.
Bioanalysis. 2015;7(8):939-55. doi: 10.4155/bio.15.1.
Artificial Neural Networks (ANN) are extensively used to model 'omics' data. Different modeling methodologies and combinations of adjustable parameters influence model performance and complicate model optimization.
We evaluated optimization of four ANN modeling parameters (learning rate annealing, stopping criteria, data split method, network architecture) using retention index (RI) data for 390 compounds. Models were assessed by independent validation (I-Val) using newly measured RI values for 1492 compounds.
The best model demonstrated an I-Val standard error of 55 RI units and was built using a Ward's clustering data split and a minimally nonlinear network architecture. Use of validation statistics for stopping and final model selection resulted in better independent validation performance than the use of test set statistics.
人工神经网络(ANN)被广泛用于对“组学”数据进行建模。不同的建模方法和可调参数组合会影响模型性能,并使模型优化变得复杂。
我们使用390种化合物的保留指数(RI)数据评估了四个ANN建模参数(学习率退火、停止标准、数据分割方法、网络架构)的优化。通过使用1492种化合物的新测量RI值进行独立验证(I-Val)来评估模型。
最佳模型的I-Val标准误差为55个RI单位,它是使用Ward聚类数据分割和最小非线性网络架构构建的。与使用测试集统计数据相比,使用验证统计数据进行停止和最终模型选择可获得更好的独立验证性能。