School of Computer Science and Engineering, Northeastern University, Shenyang, China.
Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, China.
BMC Bioinformatics. 2022 Jul 26;23(1):303. doi: 10.1186/s12859-022-04848-y.
The discovery of critical biomarkers is significant for clinical diagnosis, drug research and development. Researchers usually obtain biomarkers from microarray data, which comes from the dimensional curse. Feature selection in machine learning is usually used to solve this problem. However, most methods do not fully consider feature dependence, especially the real pathway relationship of genes.
Experimental results show that the proposed method is superior to classical algorithms and advanced methods in feature number and accuracy, and the selected features have more significance.
This paper proposes a feature selection method based on a graph neural network. The proposed method uses the actual dependencies between features and the Pearson correlation coefficient to construct graph-structured data. The information dissemination and aggregation operations based on graph neural network are applied to fuse node information on graph structured data. The redundant features are clustered by the spectral clustering method. Then, the feature ranking aggregation model using eight feature evaluation methods acts on each clustering sub-cluster for different feature selection.
The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy, which can potentially select potential biomarkers.
关键生物标志物的发现对于临床诊断、药物研发具有重要意义。研究人员通常从微阵列数据中获取生物标志物,这些数据来源于维度诅咒。机器学习中的特征选择通常用于解决这个问题。然而,大多数方法并没有充分考虑特征之间的依赖性,尤其是基因的实际通路关系。
实验结果表明,所提出的方法在特征数量和准确性方面优于经典算法和先进方法,并且选择的特征具有更高的意义。
本文提出了一种基于图神经网络的特征选择方法。所提出的方法使用特征之间的实际依赖性和皮尔逊相关系数来构建图结构数据。基于图神经网络的信息传播和聚合操作应用于融合图结构数据上的节点信息。通过谱聚类方法对冗余特征进行聚类。然后,使用八种特征评估方法的特征排序聚合模型对每个聚类子聚类进行不同的特征选择。
所提出的方法可以有效地去除冗余特征。该算法的输出具有较高的稳定性和分类准确性,有可能选择潜在的生物标志物。