基于深度学习的人类基因组约束和序列上下文对非编码区域进行优先级排序。

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning.

机构信息

Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.

National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.

出版信息

Nat Commun. 2021 Mar 8;12(1):1504. doi: 10.1038/s41467-021-21790-4.

DOI:10.1038/s41467-021-21790-4

PMID:33686085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7940646/

Abstract

Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.

摘要

阐明非编码区域的功能是人类基因组学的一个关键挑战。已经表明，对编码和近端非编码序列变异的不宽容是人类疾病相关性的一个强有力的预测因子。在这里，我们整合了变异的不宽容、功能基因组注释和主要基因组序列，构建了 JARVIS：一个全面的深度学习模型，用于对非编码区域进行优先级排序，优于其他人类谱系特异性评分。尽管 JARVIS 对进化保守性是不可知的，但在对致病性单核苷酸和结构变异进行分类时，JARVIS 的表现与基于保守性的评分相当或优于基于保守性的评分。在构建 JARVIS 时，我们引入了全基因组剩余变异不宽容评分（gwRVIS），该评分应用于来自 62784 个人的全基因组测序数据的滑动窗口方法。gwRVIS 将孟德尔疾病基因与更宽容的 CCDS 区域区分开来，并突出了超保守的非编码元件是人类基因组中最不宽容的区域。JARVIS 和 gwRVIS 都捕捉到了以前无法获得的人类谱系约束信息，将增强我们对非编码基因组的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59e3/7940646/5e09bf5cfa42/41467_2021_21790_Fig1_HTML.jpg

相似文献

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning.基于深度学习的人类基因组约束和序列上下文对非编码区域进行优先级排序。

Nat Commun. 2021 Mar 8;12(1):1504. doi: 10.1038/s41467-021-21790-4.

Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD.利用 chCADD 优先考虑鸡基因组中保守非编码元件中的序列变异。

PLoS Genet. 2020 Sep 23;16(9):e1009027. doi: 10.1371/journal.pgen.1009027. eCollection 2020 Sep.

Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.基于监督深度学习方法的全基因组顺式调控区预测。

BMC Bioinformatics. 2018 May 31;19(1):202. doi: 10.1186/s12859-018-2187-1.

A genomic mutational constraint map using variation in 76,156 human genomes.基于 76156 个人类基因组的变异，绘制出基因组突变约束图谱。

Nature. 2024 Jan;625(7993):92-100. doi: 10.1038/s41586-023-06045-0. Epub 2023 Dec 6.

Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project.利用 1000 基因组计划的人群规模测序数据分析非编码元件的基因组变异。

Nucleic Acids Res. 2011 Sep 1;39(16):7058-76. doi: 10.1093/nar/gkr342. Epub 2011 May 19.

NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans.NCBoost 通过在人类中对净化选择信号进行监督学习，对孟德尔疾病中的致病性非编码变体进行分类。

Genome Biol. 2019 Feb 11;20(1):32. doi: 10.1186/s13059-019-1634-2.

Deep sequencing of 10,000 human genomes.一万个人类基因组的深度测序。

Proc Natl Acad Sci U S A. 2016 Oct 18;113(42):11901-11906. doi: 10.1073/pnas.1613365113. Epub 2016 Oct 4.

Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning.基于深度学习的利用进化信息提高 DNA 序列定量分析。

Cells. 2019 Dec 14;8(12):1635. doi: 10.3390/cells8121635.

Pathogenic variants in non-protein-coding sequences.非蛋白编码序列中的致病变体。

Clin Genet. 2013 Nov;84(5):422-8. doi: 10.1111/cge.12272. Epub 2013 Sep 23.

Evolutionary conservation in noncoding genomic regions.非编码基因组区域的进化保守性。

Trends Genet. 2021 Oct;37(10):903-918. doi: 10.1016/j.tig.2021.06.007. Epub 2021 Jul 5.

引用本文的文献

Whole-genome sequencing of 490,640 UK Biobank participants.对490,640名英国生物银行参与者进行全基因组测序。

Nature. 2025 Aug 6. doi: 10.1038/s41586-025-09272-9.

Nephrogenomics, precision medicine and the role of genetic testing in adult kidney disease management.肾基因组学、精准医学以及基因检测在成人肾脏疾病管理中的作用。

Nat Rev Nephrol. 2025 Jun 16. doi: 10.1038/s41581-025-00970-1.

The landscape of fitness effects of putatively functional noncoding mutations in humans.人类中假定功能性非编码突变的适应性效应图景。

bioRxiv. 2025 May 14:2025.05.14.654124. doi: 10.1101/2025.05.14.654124.

Whole-genome sequencing analyses suggest novel genetic factors associated with Alzheimer's disease and a cumulative effects model for risk liability.全基因组测序分析表明存在与阿尔茨海默病相关的新遗传因素以及风险易感性的累积效应模型。

Nat Commun. 2025 May 26;16(1):4870. doi: 10.1038/s41467-025-59949-y.

Diverse ancestral representation improves genetic intolerance metrics.多样的祖先代表性可改善基因不耐受指标。

Nat Commun. 2025 Mar 18;16(1):2648. doi: 10.1038/s41467-025-57885-5.

Genome-wide prediction of dominant and recessive neurodevelopmental disorder-associated genes.全基因组对显性和隐性神经发育障碍相关基因的预测。

Am J Hum Genet. 2025 Mar 6;112(3):693-708. doi: 10.1016/j.ajhg.2025.02.001. Epub 2025 Feb 26.

Whole-genome sequencing analysis identifies rare, large-effect noncoding variants and regulatory regions associated with circulating protein levels.全基因组测序分析确定了与循环蛋白水平相关的罕见、具有大效应的非编码变异和调控区域。

Nat Genet. 2025 Mar;57(3):626-634. doi: 10.1038/s41588-025-02095-4. Epub 2025 Feb 24.

BTS: scalable Bayesian Tissue Score for prioritizing GWAS variants and their functional contexts across omics data.BTS：可扩展的贝叶斯组织评分，用于在组学数据中对全基因组关联研究（GWAS）变体及其功能背景进行优先级排序。

bioRxiv. 2025 Feb 5:2024.10.30.621077. doi: 10.1101/2024.10.30.621077.

Enhanced Discovery of Alternative Proteins (AltProts) in Mouse Cardiac Development Using Data-Independent Acquisition (DIA) Proteomics.利用数据非依赖采集（DIA）蛋白质组学增强小鼠心脏发育过程中替代蛋白（AltProts）的发现

Anal Chem. 2025 Jan 28;97(3):1517-1527. doi: 10.1021/acs.analchem.4c02924. Epub 2025 Jan 15.

Motif distribution in genomes gives insights into gene clustering and co-regulation.基因组中的基序分布有助于深入了解基因聚类和共调控。

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1178.

本文引用的文献

Promoter CpG Density Predicts Downstream Gene Loss-of-Function Intolerance.启动子 CpG 密度预测下游基因功能丧失的不耐受性。

Am J Hum Genet. 2020 Sep 3;107(3):487-498. doi: 10.1016/j.ajhg.2020.07.014. Epub 2020 Aug 14.

Expanded encyclopaedias of DNA elements in the human and mouse genomes.人类和小鼠基因组中 DNA 元件的扩展百科全书。

Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.

A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考

Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.

Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts.对来自两个队列的 70,000 多个外显子组中的数千种表型进行全基因组罕见变异分析。

Nat Commun. 2020 Jan 28;11(1):542. doi: 10.1038/s41467-020-14288-y.

Ranking of non-coding pathogenic variants and putative essential regions of the human genome.人类基因组中非编码致病性变异体和推定必需区域的排名。

Nat Commun. 2019 Nov 20;10(1):5241. doi: 10.1038/s41467-019-13212-3.

Predicting Splicing from Primary Sequence with Deep Learning.深度学习预测剪接。

Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.

The human noncoding genome defined by genetic diversity.遗传多样性定义的人类非编码基因组。

Nat Genet. 2018 Mar;50(3):333-337. doi: 10.1038/s41588-018-0062-7. Epub 2018 Feb 26.

Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression.调控景观的复杂性和保守性是哺乳动物基因表达进化弹性的基础。

Nat Ecol Evol. 2018 Jan;2(1):152-163. doi: 10.1038/s41559-017-0377-2. Epub 2017 Nov 27.

ClinVar: improving access to variant interpretations and supporting evidence.ClinVar：改善变异解读和支持证据的获取。

Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067. doi: 10.1093/nar/gkx1153.

Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation.通过基因定制的方法对错义变异进行解释，优化癫痫的基因组医学。

Genome Res. 2017 Oct;27(10):1715-1729. doi: 10.1101/gr.226589.117. Epub 2017 Sep 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于深度学习的人类基因组约束和序列上下文对非编码区域进行优先级排序。

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献