Wang Zuyi, Wang Yue, Xuan Jianhua, Dong Yibin, Bakay Marina, Feng Yuanjian, Clarke Robert, Hoffman Eric P
Center for Genetic Medicine, Children's National Medical Center Washington, DC 20010, USA.
Bioinformatics. 2006 Mar 15;22(6):755-61. doi: 10.1093/bioinformatics/btk036. Epub 2006 Jan 10.
Multilayer perceptrons (MLP) represent one of the widely used and effective machine learning methods currently applied to diagnostic classification based on high-dimensional genomic data. Since the dimensionalities of the existing genomic data often exceed the available sample sizes by orders of magnitude, the MLP performance may degrade owing to the curse of dimensionality and over-fitting, and may not provide acceptable prediction accuracy.
Based on Fisher linear discriminant analysis, we designed and implemented an MLP optimization scheme for a two-layer MLP that effectively optimizes the initialization of MLP parameters and MLP architecture. The optimized MLP consistently demonstrated its ability in easing the curse of dimensionality in large microarray datasets. In comparison with a conventional MLP using random initialization, we obtained significant improvements in major performance measures including Bayes classification accuracy, convergence properties and area under the receiver operating characteristic curve (A(z)).
The Supplementary information is available on http://www.cbil.ece.vt.edu/publications.htm
多层感知器(MLP)是目前基于高维基因组数据应用于诊断分类的广泛使用且有效的机器学习方法之一。由于现有基因组数据的维度通常比可用样本大小高出几个数量级,MLP的性能可能会因维度诅咒和过拟合而下降,并且可能无法提供可接受的预测准确性。
基于Fisher线性判别分析,我们为两层MLP设计并实现了一种MLP优化方案,该方案有效地优化了MLP参数的初始化和MLP架构。优化后的MLP在大型微阵列数据集中始终展现出缓解维度诅咒的能力。与使用随机初始化的传统MLP相比,我们在包括贝叶斯分类准确率、收敛特性和接收器操作特征曲线下面积(A(z))等主要性能指标上取得了显著改进。