跨物种预测基因调控增强子揭示了进化保守的序列特征。

Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties.

机构信息

Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America.

Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, United States of America.

出版信息

PLoS Comput Biol. 2018 Oct 4;14(10):e1006484. doi: 10.1371/journal.pcbi.1006484. eCollection 2018 Oct.

DOI:10.1371/journal.pcbi.1006484

PMID:30286077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6191148/

Abstract

Genomic regions with gene regulatory enhancer activity turnover rapidly across mammals. In contrast, gene expression patterns and transcription factor binding preferences are largely conserved between mammalian species. Based on this conservation, we hypothesized that enhancers active in different mammals would exhibit conserved sequence patterns in spite of their different genomic locations. To investigate this hypothesis, we evaluated the extent to which sequence patterns that are predictive of enhancers in one species are predictive of enhancers in other mammalian species by training and testing two types of machine learning models. We trained support vector machine (SVM) and convolutional neural network (CNN) classifiers to distinguish enhancers defined by histone marks from the genomic background based on DNA sequence patterns in human, macaque, mouse, dog, cow, and opossum. The classifiers accurately identified many adult liver, developing limb, and developing brain enhancers, and the CNNs outperformed the SVMs. Furthermore, classifiers trained in one species and tested in another performed nearly as well as classifiers trained and tested on the same species. We observed similar cross-species conservation when applying the models to human and mouse enhancers validated in transgenic assays. This indicates that many short sequence patterns predictive of enhancers are largely conserved. The sequence patterns most predictive of enhancers in each species matched the binding motifs for a common set of TFs enriched for expression in relevant tissues, supporting the biological relevance of the learned features. Thus, despite the rapid change of active enhancer locations between mammals, cross-species enhancer prediction is often possible. Our results suggest that short sequence patterns encoding enhancer activity have been maintained across more than 180 million years of mammalian evolution.

摘要

基因组中具有基因调控增强子活性的区域在哺乳动物中快速变化。相比之下，基因表达模式和转录因子结合偏好在哺乳动物之间基本保守。基于这种保守性，我们假设不同哺乳动物中活跃的增强子在不同的基因组位置仍会表现出保守的序列模式。为了验证这一假设，我们通过训练和测试两种类型的机器学习模型来评估预测一种物种中增强子的序列模式在其他哺乳动物物种中预测增强子的程度。我们训练了支持向量机 (SVM) 和卷积神经网络 (CNN) 分类器，以根据人类、猕猴、小鼠、狗、牛和负鼠的 DNA 序列模式，从基因组背景中区分由组蛋白标记定义的增强子。这些分类器准确地识别了许多成年肝脏、发育中的肢体和发育中的大脑增强子，并且 CNN 优于 SVM。此外，在一个物种中训练并在另一个物种中测试的分类器的性能几乎与在同一物种中训练和测试的分类器一样好。当我们将模型应用于在转基因实验中验证的人类和小鼠增强子时，我们观察到了类似的跨物种保守性。这表明许多预测增强子的短序列模式在很大程度上是保守的。在每个物种中最能预测增强子的序列模式与在相关组织中表达丰富的一组常见 TF 的结合基序相匹配，支持了所学习特征的生物学相关性。因此，尽管哺乳动物之间活跃的增强子位置快速变化，但跨物种的增强子预测通常是可能的。我们的结果表明，编码增强子活性的短序列模式在超过 1.8 亿年的哺乳动物进化中得以维持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/425c/6191148/f9c404af2b00/pcbi.1006484.g001.jpg

相似文献

Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties.跨物种预测基因调控增强子揭示了进化保守的序列特征。

PLoS Comput Biol. 2018 Oct 4;14(10):e1006484. doi: 10.1371/journal.pcbi.1006484. eCollection 2018 Oct.

Short DNA sequence patterns accurately identify broadly active human enhancers.短DNA序列模式可准确识别广泛活跃的人类增强子。

BMC Genomics. 2017 Jul 17;18(1):536. doi: 10.1186/s12864-017-3934-9.

Sequence Characteristics Distinguish Transcribed Enhancers from Promoters and Predict Their Breadth of Activity.序列特征可区分转录增强子与启动子，并预测其活性广度。

Genetics. 2019 Apr;211(4):1205-1217. doi: 10.1534/genetics.118.301895. Epub 2019 Jan 29.

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin.通过预测开放染色质中的组织特异性差异来推断哺乳动物组织特异性调控保守性。

BMC Genomics. 2022 Apr 11;23(1):291. doi: 10.1186/s12864-022-08450-7.

Gene Regulatory Enhancers with Evolutionarily Conserved Activity Are More Pleiotropic than Those with Species-Specific Activity.具有进化保守活性的基因调控增强子比具有物种特异性活性的增强子具有更多的多效性。

Genome Biol Evol. 2017 Oct 1;9(10):2615-2625. doi: 10.1093/gbe/evx194.

Functional conservation of sequence determinants at rapidly evolving regulatory regions across mammals.哺乳动物快速进化的调控区域中序列决定因素的功能保守性。

PLoS Comput Biol. 2018 Oct 5;14(10):e1006451. doi: 10.1371/journal.pcbi.1006451. eCollection 2018 Oct.

Enhancer prediction with histone modification marks using a hybrid neural network model.基于组蛋白修饰标记的增强子预测的混合神经网络模型。

Methods. 2019 Aug 15;166:48-56. doi: 10.1016/j.ymeth.2019.03.014. Epub 2019 Mar 21.

Many human accelerated regions are developmental enhancers.许多人类加速区是发育增强子。

Philos Trans R Soc Lond B Biol Sci. 2013 Nov 11;368(1632):20130025. doi: 10.1098/rstb.2013.0025. Print 2013 Dec 19.

Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals.缺口 k -mer 序列建模能够稳健地识别进化上相距甚远的哺乳动物之间保守的调控词汇和远端增强子。

Nat Commun. 2024 Jul 31;15(1):6464. doi: 10.1038/s41467-024-50708-z.

Integrating diverse datasets improves developmental enhancer prediction.整合多种数据集可提高发育增强子预测的准确性。

PLoS Comput Biol. 2014 Jun 26;10(6):e1003677. doi: 10.1371/journal.pcbi.1003677. eCollection 2014 Jun.

引用本文的文献

Mediator complex: update of key insights into transcriptional regulation of ancestral framework and its role in cardiovascular diseases.中介体复合物：关于祖先框架转录调控及其在心血管疾病中作用的关键见解更新

Eur J Med Res. 2025 Jun 23;30(1):507. doi: 10.1186/s40001-025-02720-2.

"Frustratingly easy" domain adaptation for cross-species transcription factor binding prediction.用于跨物种转录因子结合预测的“简单到令人沮丧”的域适应

bioRxiv. 2025 May 26:2025.05.21.655414. doi: 10.1101/2025.05.21.655414.

Predicting gene expression from DNA sequence using deep learning models.使用深度学习模型从DNA序列预测基因表达。

Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.

Inhibition of triglyceride metabolism-associated enhancers alters lipid deposition during adipocyte differentiation.抑制甘油三酯代谢相关增强子会改变脂肪细胞分化过程中的脂质沉积。

FASEB J. 2025 Jan 31;39(2):e70347. doi: 10.1096/fj.202401137R.

Characterization of Single-Cell Cis-regulatory Elements Informs Implications for Cell Differentiation.单细胞顺式调控元件的特征分析提示其对细胞分化的影响。

Genome Biol Evol. 2024 Nov 1;16(11). doi: 10.1093/gbe/evae241.

Nat Commun. 2024 Jul 31;15(1):6464. doi: 10.1038/s41467-024-50708-z.

Cross-Species Prediction of Transcription Factor Binding by Adversarial Training of a Novel Nucleotide-Level Deep Neural Network.通过新型核苷酸级别的深度神经网络的对抗训练对转录因子结合进行跨物种预测。

Adv Sci (Weinh). 2024 Sep;11(36):e2405685. doi: 10.1002/advs.202405685. Epub 2024 Jul 30.

Large-scale genomic analysis of the domestic dog informs biological discovery.对家犬进行大规模基因组分析为生物学发现提供了信息。

Genome Res. 2024 Jul 23;34(6):811-821. doi: 10.1101/gr.278569.123.

A novel method for identifying key genes in macroevolution based on deep learning with attention mechanism.基于深度学习注意力机制的宏观进化中关键基因识别的新方法。

Sci Rep. 2023 Nov 13;13(1):19727. doi: 10.1038/s41598-023-47113-9.

An intronic enhancer of Cebpa regulates adipocyte differentiation and adipose tissue development via long-range loop formation.内含子增强子 Cebpa 通过长距离环形成调节脂肪细胞分化和脂肪组织发育。

Cell Prolif. 2024 Mar;57(3):e13552. doi: 10.1111/cpr.13552. Epub 2023 Oct 31.

本文引用的文献

Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function.全基因组增强子注释在基因组分布、进化和功能上存在显著差异。

BMC Genomics. 2019 Jun 20;20(1):511. doi: 10.1186/s12864-019-5779-x.

FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data.FactorNet：一种从核苷酸分辨率序列数据预测细胞类型特异性转录因子结合的深度学习框架。

Methods. 2019 Aug 15;166:40-47. doi: 10.1016/j.ymeth.2019.03.020. Epub 2019 Mar 26.

Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression.调控景观的复杂性和保守性是哺乳动物基因表达进化弹性的基础。

Nat Ecol Evol. 2018 Jan;2(1):152-163. doi: 10.1038/s41559-017-0377-2. Epub 2017 Nov 27.

Transposable Element Exaptation into Regulatory Regions Is Rare, Influenced by Evolutionary Age, and Subject to Pleiotropic Constraints.转座元件适应性进入调控区域的情况罕见，受进化年龄影响，并受到多效性限制。

Mol Biol Evol. 2017 Nov 1;34(11):2856-2869. doi: 10.1093/molbev/msx219.

UpSetR: an R package for the visualization of intersecting sets and their properties.UpSetR：一个用于可视化相交集及其属性的 R 包。

Bioinformatics. 2017 Sep 15;33(18):2938-2940. doi: 10.1093/bioinformatics/btx364.

BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.比人：仅使用 DNA 序列通过深度学习模型预测增强子。

Bioinformatics. 2017 Jul 1;33(13):1930-1936. doi: 10.1093/bioinformatics/btx105.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特：利用深度卷积神经网络学习可及基因组的调控密码。

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

Codon-Driven Translational Efficiency Is Stable across Diverse Mammalian Cell States.密码子驱动的翻译效率在不同的哺乳动物细胞状态下保持稳定。

PLoS Genet. 2016 May 11;12(5):e1006024. doi: 10.1371/journal.pgen.1006024. eCollection 2016 May.

LS-GKM: a new gkm-SVM for large-scale datasets.LS-GKM：一种用于大规模数据集的新型gkm支持向量机

Bioinformatics. 2016 Jul 15;32(14):2196-8. doi: 10.1093/bioinformatics/btw142. Epub 2016 Mar 15.

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.DanQ：一种用于量化DNA序列功能的卷积与循环相结合的深度神经网络。

Nucleic Acids Res. 2016 Jun 20;44(11):e107. doi: 10.1093/nar/gkw226. Epub 2016 Apr 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

跨物种预测基因调控增强子揭示了进化保守的序列特征。

Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献