用于预测致病调控变异的深度学习模型的比较分析

Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.

作者信息

Manzo Gaetano, Borkowski Kathryn, Ovcharenko Ivan

机构信息

Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, USA.

Case Western Reserve University, 10900 Euclid Ave., Cleveland, OH, USA.

出版信息

bioRxiv. 2025 Jun 11:2025.05.19.654920. doi: 10.1101/2025.05.19.654920.

DOI:10.1101/2025.05.19.654920

PMID:40568119

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12190767/

Abstract

MOTIVATION

Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.

RESULTS

This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.

摘要

动机

全基因组关联研究（GWAS）已鉴定出许多与复杂人类疾病、病症和性状相关的非编码变异。然而，解决GWAS关联与因果关系之间的不确定性仍然是一项重大挑战。只有通过准确评估DNA序列变异对基因调控活性影响的方法，才能检测出对基因调控元件具有因果效应的一小部分非编码GWAS变异。深度学习模型，如基于卷积神经网络（CNN）和变换器的模型，通过从基因组和表观基因组数据中学习模式，在预测遗传变异，特别是增强子中的调控效应方面已崭露头角。尽管它们具有潜力，但现有综述中缺乏标准化基准、一致的训练条件和性能评估标准，阻碍了最合适模型的选择。

结果

本研究使用来自MPRA、raQTL和eQTL实验的九个数据集，评估了用于预测遗传变异对增强子活性影响的先进深度学习模型，分析了四种人类细胞系中54,859个单核苷酸多态性（SNP）的调控影响。结果表明，诸如TREDNet和SEI等CNN模型在预测单核苷酸多态性（SNP）的调控影响方面始终优于其他架构。然而，诸如Borzoi等CNN-变换器混合模型在识别连锁不平衡块内的因果SNP方面表现出卓越性能。虽然微调提高了基于变换器的模型的性能，但在优化条件下评估时，仍不足以超越CNN和混合模型。

相似文献

Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.用于预测致病调控变异的深度学习模型的比较分析

bioRxiv. 2025 Jun 11:2025.05.19.654920. doi: 10.1101/2025.05.19.654920.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义

APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Sexual Harassment and Prevention Training性骚扰与预防培训

PROTOCOL: Effects of interventions to improve access to financial services for micro-, small- and medium-sized enterprises in low- and middle-income countries: An evidence and gap map.方案：改善低收入和中等收入国家微型、小型和中型企业金融服务获取情况的干预措施的效果：证据与差距图

Campbell Syst Rev. 2023 Jul 5;19(3):e1341. doi: 10.1002/cl2.1341. eCollection 2023 Sep.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Short-Term Memory Impairment短期记忆障碍

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划：一项混合方法研究。

Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.

本文引用的文献

GENA-LM: a family of open-source foundational DNA language models for long sequences.GENA-LM：用于长序列的开源基础DNA语言模型家族。

Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkae1310.

Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。

Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.

Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器：构建和评估用于人类基因组学的强大基础模型。

Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.深度学习在基因组学中的应用：从早期神经网络到现代大型语言模型。

Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858.

Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits.利用深度学习对胰岛增强子进行建模，确定与 T2D 和血糖特征相关的位点的候选因果变异。

Proc Natl Acad Sci U S A. 2023 Aug 29;120(35):e2206612120. doi: 10.1073/pnas.2206612120. Epub 2023 Aug 21.

Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

A review of deep learning applications in human genomics using next-generation sequencing data.深度学习在人类基因组学中应用的研究进展：利用下一代测序数据

Hum Genomics. 2022 Jul 25;16(1):26. doi: 10.1186/s40246-022-00396-x.

A sequence-based global map of regulatory activity for deciphering human genetics.基于序列的人类遗传学解码调控活性的全局图谱。

Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.

Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用，从序列中有效预测基因表达。

Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.

The -regulatory effects of modern human-specific variants.现代人类特异性变异的调节作用。

Elife. 2021 Apr 22;10:e63713. doi: 10.7554/eLife.63713.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于预测致病调控变异的深度学习模型的比较分析

Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献