Xie Haozhe, Li Jie, Zhang Qiaosheng, Wang Yadong
School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin 150001, China.
School of Computer Science and Technology, Harbin Institute of Technology, No. 92 Xidazhi Street, Harbin 150001, China.
Comput Biol Chem. 2016 Dec;65:165-172. doi: 10.1016/j.compbiolchem.2016.09.010. Epub 2016 Sep 21.
Random Projection (RP) technique has been widely applied in many scenarios because it can reduce high-dimensional features into low-dimensional space within short time and meet the need of real-time analysis of massive data. There is an urgent need of dimensionality reduction with fast increase of big genomics data. However, the performance of RP is usually lower. We attempt to improve classification accuracy of RP through combining other reduction dimension methods such as Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Feature Selection (FS). We compared classification accuracy and running time of different combination methods on three microarray datasets and a simulation dataset. Experimental results show a remarkable improvement of 14.77% in classification accuracy of FS followed by RP compared to RP on BC-TCGA dataset. LDA followed by RP also helps RP to yield a more discriminative subspace with an increase of 13.65% on classification accuracy on the same dataset. FS followed by RP outperforms other combination methods in classification accuracy on most of the datasets.
随机投影(RP)技术已在许多场景中广泛应用,因为它能在短时间内将高维特征降维到低维空间,满足海量数据实时分析的需求。随着大型基因组数据的快速增长,迫切需要进行降维。然而,RP的性能通常较低。我们试图通过结合其他降维方法,如主成分分析(PCA)、线性判别分析(LDA)和特征选择(FS)来提高RP的分类准确率。我们在三个微阵列数据集和一个模拟数据集上比较了不同组合方法的分类准确率和运行时间。实验结果表明,在BC-TCGA数据集上,FS后接RP的分类准确率比单独使用RP显著提高了14.77%。在同一数据集上,LDA后接RP也有助于RP产生更具判别力的子空间,分类准确率提高了13.65%。在大多数数据集上,FS后接RP在分类准确率方面优于其他组合方法。