Oh Euiyoung, Lee Hyunju
Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science, Gwangju, 61005, South Korea.
Gwangju Institute of Science and Technology, Artificial Intelligence Graduate School, Gwangju, 61005, South Korea.
Sci Rep. 2024 Jul 6;14(1):15582. doi: 10.1038/s41598-024-66061-6.
Selecting relevant feature subsets is essential for machine learning applications. Among the feature selection techniques, the knockoff filter procedure proposes a unique framework that minimizes false discovery rates (FDR). However, employing a deep neural network architecture for a knockoff filter framework requires higher detection power. Using the knockoff filter framework, we present a Deep neural network with PaIrwise connected layers integrated with stochastic Gates (DeepPIG) for the feature selection model. DeepPIG exhibited better detection power in synthetic data than the baseline and recent models such as Deep feature selection using Paired-Input Nonlinear Knockoffs (DeepPINK), Stochastic Gates (STG), and SHapley Additive exPlanations (SHAP) while not violating the preselected FDR level, especially when the signal of the features were weak. The selected features determined by DeepPIG demonstrated superior classification performance compared with the baseline model in real-world data analyses, including the prediction of certain cancer prognosis and classification tasks using microbiome and single-cell datasets. In conclusion, DeepPIG is a robust feature selection approach even when the signals of features are weak. Source code is available at https://github.com/DMCB-GIST/DeepPIG .
选择相关特征子集对于机器学习应用至关重要。在特征选择技术中,仿冒过滤器程序提出了一个独特的框架,可将错误发现率(FDR)降至最低。然而,将深度神经网络架构应用于仿冒过滤器框架需要更高的检测能力。利用仿冒过滤器框架,我们提出了一种用于特征选择模型的具有成对连接层并集成随机门的深度神经网络(DeepPIG)。在合成数据中,DeepPIG比基线模型以及诸如使用成对输入非线性仿冒品的深度特征选择(DeepPINK)、随机门(STG)和夏普利加法解释(SHAP)等近期模型表现出更好的检测能力,同时不违反预先选定的FDR水平,特别是当特征信号较弱时。在包括使用微生物组和单细胞数据集预测某些癌症预后及分类任务在内的实际数据分析中,由DeepPIG确定的选定特征与基线模型相比表现出卓越的分类性能。总之,即使特征信号较弱,DeepPIG也是一种强大的特征选择方法。源代码可在https://github.com/DMCB-GIST/DeepPIG获取。