通过预测开放染色质中的组织特异性差异来推断哺乳动物组织特异性调控保守性。

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin.

机构信息

Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.

Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.

出版信息

BMC Genomics. 2022 Apr 11;23(1):291. doi: 10.1186/s12864-022-08450-7.

DOI:10.1186/s12864-022-08450-7

PMID:35410163

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8996547/

Abstract

BACKGROUND

Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high.

RESULTS

We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin.

CONCLUSION

The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements.

摘要

背景

进化保守性是推断基因组功能意义的宝贵工具，包括在许多物种中至关重要的区域和经历趋同进化的区域。用于测试序列保守性的计算方法主要由算法主导，这些算法检查一个或多个核苷酸在大进化距离上对齐的能力。虽然这些基于核苷酸对齐的方法已被证明对蛋白质编码基因和一些非编码元件非常有效，但它们无法捕捉到许多增强子的保守性，增强子是控制基因表达时空模式的远端调控元件。增强子的功能受一种复杂的、通常是组织和细胞类型特异性的调控，它将转录因子结合位点和其他与调控相关的序列模式的组合与调控活性联系起来。因此，即使核苷酸更替率很高，同源增强子区域的功能也可以在大的进化距离上保守。

结果

我们提出了一种新的基于机器学习的评估增强子保守性的方法，该方法利用增强子活性的组合序列代码，而不是依赖于单个核苷酸的对齐。我们首先训练一个卷积神经网络模型，该模型可以预测哺乳动物中组织特异性的开放染色质，这是增强子活性的一个代理。接下来，我们应用该模型来区分基因组序列预测的保守功能与该组织中失去调控活性的情况。我们提出了用于系统评估该任务的模型性能的标准，并使用它们证明我们的模型可以准确地预测灵长类和啮齿类动物之间组织特异性的开放染色质的保守性和分化，远远超过了领先的基于核苷酸对齐的方法。然后，我们将我们的模型应用于预测数百种哺乳动物中脑和肝开放染色质区域的同源开放染色质，发现与神经元活动相关的脑增强子比一般群体更倾向于具有预测的谱系特异性开放染色质。

结论

这里提出的框架提供了一种机制，可以在数百个基因组中注释组织特异性的调控功能，并使用预测的调控差异而不是核苷酸水平的保守性测量来研究增强子进化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62a6/8996547/6bb7912ba975/12864_2022_8450_Fig1_HTML.jpg

相似文献

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin.通过预测开放染色质中的组织特异性差异来推断哺乳动物组织特异性调控保守性。

BMC Genomics. 2022 Apr 11;23(1):291. doi: 10.1186/s12864-022-08450-7.

Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties.跨物种预测基因调控增强子揭示了进化保守的序列特征。

PLoS Comput Biol. 2018 Oct 4;14(10):e1006484. doi: 10.1371/journal.pcbi.1006484. eCollection 2018 Oct.

Gene Regulatory Enhancers with Evolutionarily Conserved Activity Are More Pleiotropic than Those with Species-Specific Activity.具有进化保守活性的基因调控增强子比具有物种特异性活性的增强子具有更多的多效性。

Genome Biol Evol. 2017 Oct 1;9(10):2615-2625. doi: 10.1093/gbe/evx194.

Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay.利用大规模平行报告基因检测系统分析 2000 个预测的人类增强子中的调控基序。

Genome Res. 2013 May;23(5):800-11. doi: 10.1101/gr.144899.112. Epub 2013 Mar 19.

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.深度CAPE：用于准确预测增强子的深度卷积神经网络

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577. doi: 10.1016/j.gpb.2019.04.006. Epub 2021 Feb 11.

Evolution of embryonic cis-regulatory landscapes between divergent Phallusia and Ciona ascidians.分歧的海鞘 Phallusia 和海鞘 Ciona 之间胚胎顺式调控景观的演化。

Dev Biol. 2019 Apr 15;448(2):71-87. doi: 10.1016/j.ydbio.2019.01.003. Epub 2019 Jan 17.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals.缺口 k -mer 序列建模能够稳健地识别进化上相距甚远的哺乳动物之间保守的调控词汇和远端增强子。

Nat Commun. 2024 Jul 31;15(1):6464. doi: 10.1038/s41467-024-50708-z.

Functional tests of enhancer conservation between distantly related species.远缘物种间增强子保守性的功能测试。

Development. 2003 Nov;130(21):5133-42. doi: 10.1242/dev.00711. Epub 2003 Aug 27.

Integrating diverse datasets improves developmental enhancer prediction.整合多种数据集可提高发育增强子预测的准确性。

PLoS Comput Biol. 2014 Jun 26;10(6):e1003677. doi: 10.1371/journal.pcbi.1003677. eCollection 2014 Jun.

引用本文的文献

Context-dependent regulatory variants in Alzheimer's disease.阿尔茨海默病中依赖于背景的调控变异体

bioRxiv. 2025 Jul 24:2025.07.11.659973. doi: 10.1101/2025.07.11.659973.

Combining Machine Learning and Multiplexed, Profiling to Engineer Cell Type and Behavioral Specificity.结合机器学习与多重分析来设计细胞类型和行为特异性。

bioRxiv. 2025 Jun 21:2025.06.20.660790. doi: 10.1101/2025.06.20.660790.

Evaluating methods for the prediction of cell-type-specific enhancers in the mammalian cortex.评估哺乳动物皮质中细胞类型特异性增强子预测方法。

Cell Genom. 2025 Jun 11;5(6):100879. doi: 10.1016/j.xgen.2025.100879. Epub 2025 May 21.

An systemic massively parallel platform for deciphering animal tissue-specific regulatory function.一种用于破译动物组织特异性调控功能的全系统大规模并行平台。

Front Genet. 2025 Apr 9;16:1533900. doi: 10.3389/fgene.2025.1533900. eCollection 2025.

Genome-wide chromatin accessibility and selective signals of meat rabbits reveal key Cis-regulatory elements and variants during postnatal development of skeletal muscles in rabbits.肉兔全基因组染色质可及性和选择信号揭示了家兔骨骼肌出生后发育过程中的关键顺式调控元件和变异。

BMC Genomics. 2025 Mar 25;26(1):296. doi: 10.1186/s12864-025-11496-y.

Novelty versus innovation of gene regulatory elements in human evolution and disease.人类进化与疾病中基因调控元件的新颖性与创新性

Curr Opin Genet Dev. 2025 Feb;90:102279. doi: 10.1016/j.gde.2024.102279. Epub 2024 Nov 26.

Spatial, transcriptomic, and epigenomic analyses link dorsal horn neurons to chronic pain genetic predisposition.空间、转录组和表观基因组分析将背角神经元与慢性疼痛遗传易感性联系起来。

Cell Rep. 2024 Nov 26;43(11):114876. doi: 10.1016/j.celrep.2024.114876. Epub 2024 Oct 24.

A community effort to optimize sequence-based deep learning models of gene regulation.一项旨在优化基于序列的基因调控深度学习模型的社区努力。

Nat Biotechnol. 2024 Oct 11. doi: 10.1038/s41587-024-02414-w.

Reconstructing human-specific regulatory functions in model systems.在模型系统中重建人类特异性调控功能。

Curr Opin Genet Dev. 2024 Dec;89:102259. doi: 10.1016/j.gde.2024.102259. Epub 2024 Sep 12.

Evaluating Methods for the Prediction of Cell Type-Specific Enhancers in the Mammalian Cortex.评估哺乳动物皮质中细胞类型特异性增强子预测方法

bioRxiv. 2025 Mar 25:2024.08.21.609075. doi: 10.1101/2024.08.21.609075.

本文引用的文献

Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1-2.利用神经网络对野生型和突变型 CTCF 之间的差异结合进行建模，揭示了锌指 1-2 的潜在结合偏好。

BMC Genomics. 2022 Apr 12;23(1):295. doi: 10.1186/s12864-022-08486-9.

Positive selection in noncoding genomic regions of vocal learning birds is associated with genes implicated in vocal learning and speech functions in humans.鸣叫学习鸟类非编码基因组区域的正选择与人类鸣叫学习和言语功能相关基因有关。

Genome Res. 2021 Nov;31(11):2035-2049. doi: 10.1101/gr.275989.121. Epub 2021 Oct 19.

Addiction-Associated Genetic Variants Implicate Brain Cell Type- and Region-Specific Cis-Regulatory Elements in Addiction Neurobiology.成瘾相关的遗传变异提示了细胞类型和区域特异性顺式调控元件在成瘾神经生物学中的作用。

J Neurosci. 2021 Oct 27;41(43):9008-9030. doi: 10.1523/JNEUROSCI.2534-20.2021. Epub 2021 Aug 30.

Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。

Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Modeling transcriptional regulation of model species with deep learning.利用深度学习对模式物种进行转录调控建模。

Genome Res. 2021 Jun;31(6):1097-1105. doi: 10.1101/gr.266171.120. Epub 2021 Apr 22.

Enhancer grammar in development, evolution, and disease: dependencies and interplay.增强子语法在发育、进化和疾病中的作用：依赖性和相互作用。

Dev Cell. 2021 Mar 8;56(5):575-587. doi: 10.1016/j.devcel.2021.02.016.

LINE retrotransposons characterize mammalian tissue-specific and evolutionarily dynamic regulatory regions.LINE 反转录转座子是哺乳动物组织特异性和进化上动态的调控区域的特征。

Genome Biol. 2021 Feb 18;22(1):62. doi: 10.1186/s13059-021-02260-y.

Activity-dependent regulome of human GABAergic neurons reveals new patterns of gene regulation and neurological disease heritability.人类 GABA 能神经元的活性依赖调节组揭示了新的基因调控模式和神经疾病遗传性。

Nat Neurosci. 2021 Mar;24(3):437-448. doi: 10.1038/s41593-020-00786-1. Epub 2021 Feb 4.

A comparative genomics multitool for scientific discovery and conservation.用于科学发现和保护的比较基因组学多用途工具。

Nature. 2020 Nov;587(7833):240-245. doi: 10.1038/s41586-020-2876-6. Epub 2020 Nov 11.

Progressive Cactus is a multiple-genome aligner for the thousand-genome era.渐进仙人掌是一个适用于千基因组时代的多基因组比对工具。

Nature. 2020 Nov;587(7833):246-251. doi: 10.1038/s41586-020-2871-y. Epub 2020 Nov 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过预测开放染色质中的组织特异性差异来推断哺乳动物组织特异性调控保守性。

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献