Advanced Modelling and Applied Computing Laboratory, Department of Mathematics, Run Run Shaw Building, The University of Hong Kong, Pokfulam Road, Hong Kong.
Comput Math Methods Med. 2012;2012:205025. doi: 10.1155/2012/205025. Epub 2012 Aug 7.
High dimensional bioinformatics data sets provide an excellent and challenging research problem in machine learning area. In particular, DNA microarrays generated gene expression data are of high dimension with significant level of noise. Supervised kernel learning with an SVM classifier was successfully applied in biomedical diagnosis such as discriminating different kinds of tumor tissues. Correlation Kernel has been recently applied to classification problems with Support Vector Machines (SVMs). In this paper, we develop a novel and parsimonious positive semidefinite kernel. The proposed kernel is shown experimentally to have better performance when compared to the usual correlation kernel. In addition, we propose a new kernel based on the correlation matrix incorporating techniques dealing with indefinite kernel. The resulting kernel is shown to be positive semidefinite and it exhibits superior performance to the two kernels mentioned above. We then apply the proposed method to some cancer data in discriminating different tumor tissues, providing information for diagnosis of diseases. Numerical experiments indicate that our method outperforms the existing methods such as the decision tree method and KNN method.
高维生物信息数据集为机器学习领域提供了一个极好且极具挑战性的研究问题。特别是,DNA 微阵列生成的基因表达数据具有很高的维度,并且存在大量的噪声。带 SVM 分类器的监督核学习已成功应用于生物医学诊断,例如区分不同类型的肿瘤组织。相关核最近已应用于支持向量机 (SVM) 的分类问题。在本文中,我们开发了一种新颖而简洁的正定半定核。实验表明,与常用的相关核相比,该核具有更好的性能。此外,我们提出了一种基于相关矩阵的新核,该核结合了处理不定核的技术。所得到的核是正定半定的,其性能优于上述两个核。然后,我们将所提出的方法应用于一些癌症数据,以区分不同的肿瘤组织,为疾病诊断提供信息。数值实验表明,我们的方法优于现有的方法,如决策树方法和 KNN 方法。