IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2420-2430. doi: 10.1109/TCBB.2021.3065054. Epub 2022 Aug 8.
Extracting genes involved in cancer lesions from gene expression data is critical for cancer research and drug development. The method of feature selection has attracted much attention in the field of bioinformatics. Principal Component Analysis (PCA) is a widely used method for learning low-dimensional representation. Some variants of PCA have been proposed to improve the robustness and sparsity of the algorithm. However, the existing methods ignore the high-order relationships between data. In this paper, a new model named Robust Principal Component Analysis via Hypergraph Regularization (HRPCA) is proposed. In detail, HRPCA utilizes L2,1-norm to reduce the effect of outliers and make data sufficiently row-sparse. And the hypergraph regularization is introduced to consider the complex relationship among data. Important information hidden in the data are mined, and this method ensures the accuracy of the resulting data relationship information. Extensive experiments on multi-view biological data demonstrate that the feasible and effective of the proposed approach.
从基因表达数据中提取与癌症病变相关的基因对于癌症研究和药物开发至关重要。特征选择方法在生物信息学领域引起了广泛关注。主成分分析(PCA)是学习低维表示的一种常用方法。已经提出了一些 PCA 的变体来提高算法的稳健性和稀疏性。然而,现有的方法忽略了数据之间的高阶关系。在本文中,提出了一种名为基于超图正则化的鲁棒主成分分析(HRPCA)的新模型。具体来说,HRPCA 使用 L2,1-范数来减少离群值的影响,并使数据充分行稀疏。并且引入了超图正则化来考虑数据之间的复杂关系。挖掘数据中隐藏的重要信息,并确保所得数据关系信息的准确性。对多视图生物数据的广泛实验表明了该方法的可行性和有效性。