评估深度学习标注在人类复杂疾病中的信息量。

Evaluating the informativeness of deep learning annotations for human complex diseases.

机构信息

Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA.

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.

出版信息

Nat Commun. 2020 Sep 17;11(1):4703. doi: 10.1038/s41467-020-18515-4.

DOI:10.1038/s41467-020-18515-4

PMID:32943643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7499261/

Abstract

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.

摘要

深度学习模型在从 DNA 序列预测调控效应方面表现出巨大的潜力，但它们对人类复杂疾病的信息量还不完全清楚。在这里，我们通过分层 LD 得分回归，应用于 41 种疾病和特征（平均 N=320K），对之前的两个深度学习模型 DeepSEA 和 Basenji 的全基因组 SNP 注释进行了评估，同时考虑了广泛的编码、保守和调控注释。我们在所有（分别为 11 个血液或 8 个大脑）特征的所有（分别为 11 个血液或 8 个大脑）组织/细胞类型的荟萃分析中对注释进行了汇总。这些注释在疾病遗传力方面高度富集，但只产生了有限的条件显著结果：非组织特异性和大脑特异性 Basenji-H3K4me3 分别用于所有特征和大脑特征。我们得出结论，深度学习模型尚未充分发挥其提供复杂疾病大量独特信息的潜力，而且它们对疾病的条件信息量不能从它们在预测调控注释方面的准确性推断出来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0275/7499261/76b38ff8eac9/41467_2020_18515_Fig1_HTML.jpg

相似文献

Evaluating the informativeness of deep learning annotations for human complex diseases.评估深度学习标注在人类复杂疾病中的信息量。

Nat Commun. 2020 Sep 17;11(1):4703. doi: 10.1038/s41467-020-18515-4.

Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease.提高孟德尔疾病衍生致病性评分在常见疾病中的信息性。

Nat Commun. 2020 Dec 7;11(1):6258. doi: 10.1038/s41467-020-20087-2.

Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species.疾病遗传调控元件富集的功能在具有古老序列年龄和物种间保守功能的元件中集中体现。

Am J Hum Genet. 2019 Apr 4;104(4):611-624. doi: 10.1016/j.ajhg.2019.02.008. Epub 2019 Mar 21.

Genes with High Network Connectivity Are Enriched for Disease Heritability.高网络连通性的基因富集了疾病遗传性。

Am J Hum Genet. 2019 May 2;104(5):896-913. doi: 10.1016/j.ajhg.2019.03.020.

Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability.注释捕获细胞类型特异性 TF 结合可解释很大一部分疾病遗传率。

Hum Mol Genet. 2020 May 8;29(7):1057-1067. doi: 10.1093/hmg/ddz226.

Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection.人类复杂性状的连锁不平衡依赖结构显示出负选择的作用。

Nat Genet. 2017 Oct;49(10):1421-1427. doi: 10.1038/ng.3954. Epub 2017 Sep 11.

Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations.低频变异的功能结构凸显了负选择在编码和非编码注释上的强大作用。

Nat Genet. 2018 Nov;50(11):1600-1607. doi: 10.1038/s41588-018-0231-8. Epub 2018 Oct 8.

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability.高通量推断成对合并时间可识别选择信号和富集疾病遗传率。

Nat Genet. 2018 Sep;50(9):1311-1317. doi: 10.1038/s41588-018-0177-x. Epub 2018 Aug 13.

Predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations.使用密集表观基因组映射卷积神经网络模型预测调控变异，阐明了性状-组织关联的分子基础。

Nucleic Acids Res. 2021 Jan 11;49(1):53-66. doi: 10.1093/nar/gkaa1137.

Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.人类基因组的系统性组织特异性功能注释揭示了晚发性阿尔茨海默病的免疫相关DNA元件。

PLoS Genet. 2017 Jul 24;13(7):e1006933. doi: 10.1371/journal.pgen.1006933. eCollection 2017 Jul.

引用本文的文献

Perspective on recent developments and challenges in regulatory and systems genomics.监管与系统基因组学的最新进展及挑战之展望

Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.

Multi-dimensional annotation of porcine variants using genomic and epigenomic features in pigs.利用猪的基因组和表观基因组特征对猪的变异进行多维度注释。

BMC Biol. 2025 Jul 1;23(1):188. doi: 10.1186/s12915-025-02279-8.

Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain.整合多组学QTL共定位绘制衰老人类大脑中的调控图谱

medRxiv. 2025 May 6:2025.04.17.25326042. doi: 10.1101/2025.04.17.25326042.

Predicting gene expression from DNA sequence using deep learning models.使用深度学习模型从DNA序列预测基因表达。

Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.

Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics.用于人类遗传学中因果调控变异预测的DNA序列模型基准测试

bioRxiv. 2025 Mar 4:2025.02.11.637758. doi: 10.1101/2025.02.11.637758.

Iterative improvement of deep learning models using synthetic regulatory genomics.利用合成调控基因组学对深度学习模型进行迭代改进。

bioRxiv. 2025 Feb 21:2025.02.04.636130. doi: 10.1101/2025.02.04.636130.

Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation.利用深度基因组注释解析阿尔茨海默病风险单核苷酸多态性的组织特异性功能效应。

BioData Min. 2024 Nov 13;17(1):50. doi: 10.1186/s13040-024-00400-1.

Current genomic deep learning models display decreased performance in cell type-specific accessible regions.目前的基因组深度学习模型在细胞类型特异性可及区域的表现有所下降。

Genome Biol. 2024 Aug 1;25(1):202. doi: 10.1186/s13059-024-03335-2.

Cross-Species Prediction of Transcription Factor Binding by Adversarial Training of a Novel Nucleotide-Level Deep Neural Network.通过新型核苷酸级别的深度神经网络的对抗训练对转录因子结合进行跨物种预测。

Adv Sci (Weinh). 2024 Sep;11(36):e2405685. doi: 10.1002/advs.202405685. Epub 2024 Jul 30.

Current genomic deep learning models display decreased performance in cell type specific accessible regions.当前的基因组深度学习模型在细胞类型特异性可及区域表现出性能下降。

bioRxiv. 2024 Jul 10:2024.07.05.602265. doi: 10.1101/2024.07.05.602265.

本文引用的文献

Cross-species regulatory sequence activity prediction.跨物种调控序列活性预测。

PLoS Comput Biol. 2020 Jul 20;16(7):e1008050. doi: 10.1371/journal.pcbi.1008050. eCollection 2020 Jul.

Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability.注释捕获细胞类型特异性 TF 结合可解释很大一部分疾病遗传率。

Hum Mol Genet. 2020 May 8;29(7):1057-1067. doi: 10.1093/hmg/ddz226.

Functional disease architectures reveal unique biological role of transposable elements.功能疾病结构揭示转座元件的独特生物学作用。

Nat Commun. 2019 Sep 6;10(1):4054. doi: 10.1038/s41467-019-11957-5.

Reconciling S-LDSC and LDAK functional enrichment estimates.调和S-LDSC和LDAK功能富集估计值。

Nat Genet. 2019 Aug;51(8):1202-1204. doi: 10.1038/s41588-019-0464-1.

Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.全基因组深度学习分析鉴定非编码突变对自闭症风险的贡献。

Nat Genet. 2019 Jun;51(6):973-980. doi: 10.1038/s41588-019-0420-0. Epub 2019 May 27.

Deep learning: new computational modelling techniques for genomics.深度学习：基因组学的新计算建模技术。

Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6.

NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans.NCBoost 通过在人类中对净化选择信号进行监督学习，对孟德尔疾病中的致病性非编码变体进行分类。

Genome Biol. 2019 Feb 11;20(1):32. doi: 10.1186/s13059-019-1634-2.

The cis-Regulatory Atlas of the Mouse Immune System.小鼠免疫系统的顺式调控图谱。

Cell. 2019 Feb 7;176(4):897-912.e20. doi: 10.1016/j.cell.2018.12.036. Epub 2019 Jan 24.

Predicting Splicing from Primary Sequence with Deep Learning.深度学习预测剪接。

Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.

Biological relevance of computationally predicted pathogenicity of noncoding variants.计算预测的非编码变异的致病性的生物学相关性。

Nat Commun. 2019 Jan 18;10(1):330. doi: 10.1038/s41467-018-08270-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估深度学习标注在人类复杂疾病中的信息量。

Evaluating the informativeness of deep learning annotations for human complex diseases.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献