Suppr超能文献

用于预测致病调控变异的深度学习模型的比较分析

Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.

作者信息

Manzo Gaetano, Borkowski Kathryn, Ovcharenko Ivan

机构信息

Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, USA.

Case Western Reserve University, 10900 Euclid Ave., Cleveland, OH, USA.

出版信息

bioRxiv. 2025 Jun 11:2025.05.19.654920. doi: 10.1101/2025.05.19.654920.

Abstract

MOTIVATION

Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.

RESULTS

This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.

摘要

动机

全基因组关联研究(GWAS)已鉴定出许多与复杂人类疾病、病症和性状相关的非编码变异。然而,解决GWAS关联与因果关系之间的不确定性仍然是一项重大挑战。只有通过准确评估DNA序列变异对基因调控活性影响的方法,才能检测出对基因调控元件具有因果效应的一小部分非编码GWAS变异。深度学习模型,如基于卷积神经网络(CNN)和变换器的模型,通过从基因组和表观基因组数据中学习模式,在预测遗传变异,特别是增强子中的调控效应方面已崭露头角。尽管它们具有潜力,但现有综述中缺乏标准化基准、一致的训练条件和性能评估标准,阻碍了最合适模型的选择。

结果

本研究使用来自MPRA、raQTL和eQTL实验的九个数据集,评估了用于预测遗传变异对增强子活性影响的先进深度学习模型,分析了四种人类细胞系中54,859个单核苷酸多态性(SNP)的调控影响。结果表明,诸如TREDNet和SEI等CNN模型在预测单核苷酸多态性(SNP)的调控影响方面始终优于其他架构。然而,诸如Borzoi等CNN-变换器混合模型在识别连锁不平衡块内的因果SNP方面表现出卓越性能。虽然微调提高了基于变换器的模型的性能,但在优化条件下评估时,仍不足以超越CNN和混合模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验