Ramprasad Pratik, Ren Jingchen, Pan Wei
Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA.
Genet Epidemiol. 2025 Jan;49(1):e22595. doi: 10.1002/gepi.22595. Epub 2024 Sep 30.
Transcriptome-wide association studies (TWAS) aim to uncover genotype-phenotype relationships through a two-stage procedure: predicting gene expression from genotypes using an expression quantitative trait locus (eQTL) data set, then testing the predicted expression for trait associations. Accurate gene expression prediction in stage 1 is crucial, as it directly impacts the power to identify associations in stage 2. Currently, the first stage of such studies is primarily conducted using linear models like elastic net regression, which fail to capture the nonlinear relationships inherent in biological systems. Deep learning methods have the potential to model such nonlinear effects, but have yet to demonstrably outperform linear methods at this task. To address this gap, we propose a new deep learning architecture to predict gene expression from genotypic variation across individuals. Our method utilizes a learnable input scaling layer in conjunction with a convolutional encoder to capture nonlinear effects and higher-order interactions without compromising on interpretability. We further augment this approach to allow for parameter sharing across multiple networks, enabling us to utilize prior information for individual variants in the form of functional annotations. Evaluations on real-world genomic data show that our method consistently outperforms elastic net regression across a large set of heritable genes. Furthermore, our model statistically significantly improved predictive performance by leveraging functional annotations, whereas elastic net regression failed to show equivalent gains when using the same information, suggesting that our method can capture nonlinear functional information beyond the capability of linear models.
全转录组关联研究(TWAS)旨在通过两阶段程序揭示基因型与表型之间的关系:使用表达定量性状位点(eQTL)数据集从基因型预测基因表达,然后测试预测的表达与性状的关联。第一阶段准确的基因表达预测至关重要,因为它直接影响第二阶段识别关联的能力。目前,此类研究的第一阶段主要使用弹性网络回归等线性模型进行,这些模型无法捕捉生物系统中固有的非线性关系。深度学习方法有潜力对这种非线性效应进行建模,但在这项任务中尚未明显优于线性方法。为了弥补这一差距,我们提出了一种新的深度学习架构,用于从个体间的基因型变异预测基因表达。我们的方法利用一个可学习的输入缩放层与一个卷积编码器相结合,以捕捉非线性效应和高阶相互作用,同时不影响可解释性。我们进一步扩展了这种方法,以允许在多个网络之间共享参数,使我们能够以功能注释的形式利用个体变异的先验信息。对真实世界基因组数据的评估表明,我们的方法在大量可遗传基因上始终优于弹性网络回归。此外,我们的模型通过利用功能注释在统计上显著提高了预测性能,而弹性网络回归在使用相同信息时未能显示出同等的提升,这表明我们的方法能够捕捉线性模型能力之外的非线性功能信息。