利用深度学习和功能注释增强基因表达预测

Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations.

作者信息

Ramprasad Pratik, Ren Jingchen, Pan Wei

机构信息

Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA.

出版信息

Genet Epidemiol. 2025 Jan;49(1):e22595. doi: 10.1002/gepi.22595. Epub 2024 Sep 30.

DOI:10.1002/gepi.22595

PMID:39344923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11656135/

Abstract

Transcriptome-wide association studies (TWAS) aim to uncover genotype-phenotype relationships through a two-stage procedure: predicting gene expression from genotypes using an expression quantitative trait locus (eQTL) data set, then testing the predicted expression for trait associations. Accurate gene expression prediction in stage 1 is crucial, as it directly impacts the power to identify associations in stage 2. Currently, the first stage of such studies is primarily conducted using linear models like elastic net regression, which fail to capture the nonlinear relationships inherent in biological systems. Deep learning methods have the potential to model such nonlinear effects, but have yet to demonstrably outperform linear methods at this task. To address this gap, we propose a new deep learning architecture to predict gene expression from genotypic variation across individuals. Our method utilizes a learnable input scaling layer in conjunction with a convolutional encoder to capture nonlinear effects and higher-order interactions without compromising on interpretability. We further augment this approach to allow for parameter sharing across multiple networks, enabling us to utilize prior information for individual variants in the form of functional annotations. Evaluations on real-world genomic data show that our method consistently outperforms elastic net regression across a large set of heritable genes. Furthermore, our model statistically significantly improved predictive performance by leveraging functional annotations, whereas elastic net regression failed to show equivalent gains when using the same information, suggesting that our method can capture nonlinear functional information beyond the capability of linear models.

摘要

全转录组关联研究（TWAS）旨在通过两阶段程序揭示基因型与表型之间的关系：使用表达定量性状位点（eQTL）数据集从基因型预测基因表达，然后测试预测的表达与性状的关联。第一阶段准确的基因表达预测至关重要，因为它直接影响第二阶段识别关联的能力。目前，此类研究的第一阶段主要使用弹性网络回归等线性模型进行，这些模型无法捕捉生物系统中固有的非线性关系。深度学习方法有潜力对这种非线性效应进行建模，但在这项任务中尚未明显优于线性方法。为了弥补这一差距，我们提出了一种新的深度学习架构，用于从个体间的基因型变异预测基因表达。我们的方法利用一个可学习的输入缩放层与一个卷积编码器相结合，以捕捉非线性效应和高阶相互作用，同时不影响可解释性。我们进一步扩展了这种方法，以允许在多个网络之间共享参数，使我们能够以功能注释的形式利用个体变异的先验信息。对真实世界基因组数据的评估表明，我们的方法在大量可遗传基因上始终优于弹性网络回归。此外，我们的模型通过利用功能注释在统计上显著提高了预测性能，而弹性网络回归在使用相同信息时未能显示出同等的提升，这表明我们的方法能够捕捉线性模型能力之外的非线性功能信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/508f/11656135/32dd49dc18ca/GEPI-49-0-g009.jpg

相似文献

Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations.

Genet Epidemiol. 2025 Jan;49(1):e22595. doi: 10.1002/gepi.22595. Epub 2024 Sep 30.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Short-Term Memory Impairment

CADET: Enhanced transcriptome-wide association analyses in admixed samples using eQTL summary data.

Am J Hum Genet. 2025 Jul 3;112(7):1580-1596. doi: 10.1016/j.ajhg.2025.05.010. Epub 2025 Jun 13.

Predicting cognitive decline: Deep-learning reveals subtle brain changes in pre-MCI stage.

J Prev Alzheimers Dis. 2025 May;12(5):100079. doi: 10.1016/j.tjpad.2025.100079. Epub 2025 Feb 6.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer's disease.

PLoS Genet. 2025 Apr 10;21(4):e1011659. doi: 10.1371/journal.pgen.1011659. eCollection 2025 Apr.

引用本文的文献

The integration of genome-wide and transcriptome-wide association studies in neurodegenerative diseases: opportunities, challenges, and current methodological innovations.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf350.

本文引用的文献

Distinct genetic liability profiles define clinically relevant patient strata across common diseases.

Nat Commun. 2024 Jul 1;15(1):5534. doi: 10.1038/s41467-024-49338-2.

Personal transcriptome variation is poorly explained by current genomic deep learning models.

Nat Genet. 2023 Dec;55(12):2056-2059. doi: 10.1038/s41588-023-01574-w. Epub 2023 Nov 30.

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.

Nat Genet. 2023 Dec;55(12):2060-2064. doi: 10.1038/s41588-023-01524-6. Epub 2023 Nov 30.

graph-GPA 2.0: improving multi-disease genetic analysis with integration of functional annotation data.

Front Genet. 2023 Jul 12;14:1079198. doi: 10.3389/fgene.2023.1079198. eCollection 2023.

Statistical power of transcriptome-wide association studies.

Genet Epidemiol. 2022 Dec;46(8):572-588. doi: 10.1002/gepi.22491. Epub 2022 Jun 29.

Integrating variant functional annotation scores have varied abilities to improve power of genome-wide association studies.

Sci Rep. 2022 Jun 24;12(1):10720. doi: 10.1038/s41598-022-14924-1.

New insights into the genetic etiology of Alzheimer's disease and related dementias.

Nat Genet. 2022 Apr;54(4):412-436. doi: 10.1038/s41588-022-01024-z. Epub 2022 Apr 4.

Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets.

Nat Commun. 2021 Oct 18;12(1):6052. doi: 10.1038/s41467-021-25171-9.

Effective gene expression prediction from sequence by integrating long-range interactions.

Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

J Am Stat Assoc. 2021;116(533):133-143. doi: 10.1080/01621459.2020.1764849. Epub 2020 Oct 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用深度学习和功能注释增强基因表达预测

Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献