IEEE Trans Nanobioscience. 2019 Jul;18(3):482-489. doi: 10.1109/TNB.2019.2917814. Epub 2019 May 20.
Machine learning is becoming a powerful tool for cancer diagnosis and prognosis based on classification using high dimensional molecular data. However, extracting classification features from high-dimensional datasets remains a challenging problem. Principal component analysis (PCA) is a widely used method for dimensionality reduction. However, it is well-known that PCA and most PCA-based feature extraction methods are sensitive to noise, which may affect the accuracy of the subsequent classification. To address this problem, here we have proposed a robust fuzzy principal component analysis (PCA) with interval type-2 (IT-2) fuzzy membership functions for feature extraction. We have tested the performance of three widely used classifiers using the features extracted by proposed approaches and other feature extraction methods - PCA-based feature extraction methods (i.e. conventional PCA and fuzzy PCA), linear discriminant analysis (LDA), and support vector machine recursive feature elimination (SVM-RFE). The proposed feature extraction approaches showed better performance on cancer transcriptome and proteome datasets.
基于分类的机器学习方法利用高维分子数据来进行癌症诊断和预后分析,已经成为一种强大的工具。然而,从高维数据集提取分类特征仍然是一个具有挑战性的问题。主成分分析(PCA)是一种广泛使用的降维方法。然而,众所周知,PCA 和大多数基于 PCA 的特征提取方法对噪声很敏感,这可能会影响后续分类的准确性。为了解决这个问题,我们提出了一种具有区间型-2(IT-2)模糊隶属函数的稳健模糊主成分分析(PCA)用于特征提取。我们使用所提出的方法和其他特征提取方法(即基于 PCA 的特征提取方法(传统 PCA 和模糊 PCA)、线性判别分析(LDA)和支持向量机递归特征消除(SVM-RFE))提取的特征,测试了三种广泛使用的分类器的性能。在所提出的特征提取方法中,在癌症转录组和蛋白质组数据集上表现出了更好的性能。