Kolali Khormuji Morteza, Bazrafkan Mehrnoosh
Islamic Azad University, Science and Research Branch, Bushehr, Iran.
Med Biol Eng Comput. 2016 Jun;54(6):869-76. doi: 10.1007/s11517-015-1382-8. Epub 2015 Sep 4.
High-dimensional genomic and proteomic data play an important role in many applications in medicine such as prognosis of diseases, diagnosis, prevention and molecular biology, to name a few. Classifying such data is a challenging task due to the various issues such as curse of dimensionality, noise and redundancy. Recently, some researchers have used the sparse representation (SR) techniques to analyze high-dimensional biological data in various applications in classification of cancer patients based on gene expression datasets. A common problem with all SR-based biological data classification methods is that they cannot utilize the topological (geometrical) structure of data. More precisely, these methods transfer the data into sparse feature space without preserving the local structure of data points. In this paper, we proposed a novel SR-based cancer classification algorithm based on gene expression data that takes into account the geometrical information of all data. Precisely speaking, we incorporate the local linear embedding algorithm into the sparse coding framework, by which we can preserve the geometrical structure of all data. For performance comparison, we applied our algorithm on six tumor gene expression datasets, by which we demonstrate that the proposed method achieves higher classification accuracy than state-of-the-art SR-based tumor classification algorithms.
高维基因组和蛋白质组数据在医学的许多应用中发挥着重要作用,如疾病预后、诊断、预防和分子生物学等。由于维度诅咒、噪声和冗余等各种问题,对这类数据进行分类是一项具有挑战性的任务。最近,一些研究人员使用稀疏表示(SR)技术,基于基因表达数据集,在癌症患者分类的各种应用中分析高维生物数据。所有基于SR的生物数据分类方法的一个共同问题是,它们无法利用数据的拓扑(几何)结构。更确切地说,这些方法将数据转换到稀疏特征空间,而不保留数据点的局部结构。在本文中,我们提出了一种基于基因表达数据的新型基于SR的癌症分类算法,该算法考虑了所有数据的几何信息。确切地说,我们将局部线性嵌入算法纳入稀疏编码框架,通过这种方式我们可以保留所有数据的几何结构。为了进行性能比较,我们将我们的算法应用于六个肿瘤基因表达数据集,由此证明所提出的方法比基于SR的现有肿瘤分类算法具有更高的分类准确率。