Sánchez R, Argáez M, Guillén P
University of Texas aEl Paso, TX 79968, USA.
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:3362-6. doi: 10.1109/IEMBS.2011.6090911.
The development of cancer diagnosis models and cancer discovery from DNA microarray data are of great interest in bioinformatics and medicine. In pattern recognition and machine learning, a classification problem refers to finding an algorithm for assigning a given input data into one of several categories. Many natural signals are sparse or compressible in the sense that they have short representations when expressed in a suitable basis. Motivated by the recent successful algorithm developments for sparse signal recovery, we apply the selective nature of sparse representation to perform the above mentioned classification. In order to find such sparse representation we implement an ℓ(1)-minimization algorithm. This methodology overcomes the lack of robustness with respect to outliers. In contrast to other classification algorithms, no model selection dependency is involved. The minimization algorithm is a convex relaxation-like that has been proven to efficiently recover sparse signals. To study its performance, the proposed method is applied to six tumor gene expression datasets and numerically compared with various support vector machine methods (SVM). The numerical results show that the ℓ(1)-minimization algorithm proposed performs at least comparably and often better than SVMs.
从DNA微阵列数据中开发癌症诊断模型以及发现癌症,在生物信息学和医学领域备受关注。在模式识别和机器学习中,分类问题是指找到一种算法,将给定的输入数据分配到几个类别之一。许多自然信号在某种意义上是稀疏的或可压缩的,即在合适的基下表示时具有简短的形式。受近期稀疏信号恢复算法成功发展的启发,我们应用稀疏表示的选择性来执行上述分类。为了找到这种稀疏表示,我们实现了一种ℓ(1)最小化算法。该方法克服了对异常值缺乏鲁棒性的问题。与其他分类算法不同,它不涉及模型选择的依赖性。最小化算法类似于一种凸松弛,已被证明能有效地恢复稀疏信号。为了研究其性能,将所提出的方法应用于六个肿瘤基因表达数据集,并与各种支持向量机方法(SVM)进行数值比较。数值结果表明,所提出的ℓ(1)最小化算法的性能至少与支持向量机相当,且通常优于支持向量机。