Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA 90089.
Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089;
Proc Natl Acad Sci U S A. 2021 Sep 7;118(36). doi: 10.1073/pnas.2104683118.
We propose a deep learning-based knockoffs inference framework, DeepLINK, that guarantees the false discovery rate (FDR) control in high-dimensional settings. DeepLINK is applicable to a broad class of covariate distributions described by the possibly nonlinear latent factor models. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control. The empirical performance of DeepLINK is investigated through extensive simulation studies, where it is shown to achieve FDR control in feature selection with both high selection power and high prediction accuracy. We also apply DeepLINK to three real data applications to demonstrate its practical utility.
我们提出了一个基于深度学习的 knockoffs 推断框架 DeepLINK,它可以在高维环境中保证错误发现率(FDR)的控制。DeepLINK 适用于由潜在因子模型描述的可能是非线性的广泛类别的协变量分布。它由两个主要部分组成:用于 knockoff 变量构建的自动编码器网络和用于具有 FDR 控制的特征选择的多层感知机网络。通过广泛的模拟研究,研究了 DeepLINK 的经验性能,结果表明它在特征选择中具有高选择能力和高预测精度,同时实现了 FDR 控制。我们还将 DeepLINK 应用于三个实际数据应用程序,以证明其实际效用。