Arana-Daniel Nancy, Gallegos Alberto A, López-Franco Carlos, Alanís Alma Y, Morales Jacob, López-Franco Adriana
Centro Universitario de Ciencias Exactas e Ingenieras, Universidad de Guadalajara, Guadalajara, Jalisco, México.
Evol Bioinform Online. 2016 Dec 4;12:285-302. doi: 10.4137/EBO.S40912. eCollection 2016.
With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.
随着计算机性能的不断提升,在短时间内能够处理的数据量呈指数级增长,高效分类大规模数据的重要性也与日俱增。支持向量机在对大量高维数据进行分类时表现出了良好的效果,比如蛋白质结构预测、垃圾邮件识别、医学诊断、光学字符识别以及文本分类等所生成的数据。大多数用于大规模学习的先进方法都采用传统优化方法,如二次规划或梯度下降,这使得利用进化算法来训练支持向量机成为一个有待探索的领域。本文提出了一种基于进化算法和核自适应神经元的简单易行的方法来解决大规模分类问题,重点关注蛋白质结构预测。蛋白质的功能特性取决于其三维结构。了解蛋白质的结构对生物学至关重要,并且能够在医学、农业和生物燃料等领域带来改进。