Suppr超能文献

深度学习可以预测古代而非新/合成多倍化基因组中的亚基因组优势。

Deep learning can predict subgenome dominance in ancient but not in neo/synthetic polyploidized genomes.

机构信息

State Key Laboratory of Vegetable Biobreeding, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture and Rural Affairs, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China.

出版信息

Plant J. 2024 Oct;120(1):174-186. doi: 10.1111/tpj.16979. Epub 2024 Aug 12.

Abstract

Deep learning offers new approaches to investigate the mechanisms underlying complex biological phenomena, such as subgenome dominance. Subgenome dominance refers to the dominant expression and/or biased fractionation of genes in one subgenome of allopolyploids, which has shaped the evolution of a large group of plants. However, the underlying cause of subgenome dominance remains elusive. Here, we adopt deep learning to construct two convolutional neural network (CNN) models, binary expression model (BEM) and homoeolog contrast model (HCM), to investigate the mechanism underlying subgenome dominance using DNA sequence and methylation sites. We apply these CNN models to analyze three representative polyploidization systems, Brassica, Gossypium, and Cucurbitaceae, each with available ancient and neo/synthetic polyploidized genomes. The BEM shows that DNA sequence of the promoter region can accurately predict whether a gene is expressed or not. More importantly, the HCM shows that the DNA sequence of the promoter region predicts dominant expression status between homoeologous gene pairs retained from ancient polyploidizations, thus predicting subgenome dominance associated with these events. However, HCM fails to predict gene expression dominance between new homoeologous gene pairs arising from the neo/synthetic polyploidizations. These results are consistent across the three plant polyploidization systems, indicating broad applicability of our models. Furthermore, the two models based on methylation sites produce similar results. These results show that subgenome dominance is associated with long-term sequence differentiation between the promoters of homoeologs, suggesting that subgenome expression dominance precedes and is the driving force or even the determining factor for sequence divergence between subgenomes following polyploidization.

摘要

深度学习为研究复杂生物学现象(如亚基因组优势)的机制提供了新方法。亚基因组优势是指在异源多倍体中一个亚基因组的基因表达优势和/或偏向性分离,这一现象塑造了一大类植物的进化。然而,亚基因组优势的根本原因仍然难以捉摸。在这里,我们采用深度学习构建了两个卷积神经网络(CNN)模型,即二元表达模型(BEM)和同源基因对对比模型(HCM),使用 DNA 序列和甲基化位点来研究亚基因组优势的机制。我们将这些 CNN 模型应用于分析三个代表性的多倍化系统,即芸薹属、棉属和葫芦科,每个系统都有可用的古代和新/合成多倍体基因组。BEM 表明启动子区域的 DNA 序列可以准确预测基因是否表达。更重要的是,HCM 表明,来自古代多倍体化的同源基因对的启动子区域的 DNA 序列可以预测它们之间的优势表达状态,从而预测与这些事件相关的亚基因组优势。然而,HCM 无法预测来自新合成多倍体化的新同源基因对之间的基因表达优势。这些结果在三个植物多倍化系统中是一致的,表明我们的模型具有广泛的适用性。此外,基于甲基化位点的两个模型产生了相似的结果。这些结果表明,亚基因组优势与同源基因启动子之间的长期序列分化有关,这表明亚基因组表达优势先于并成为多倍化后亚基因组之间序列分化的驱动力,甚至是决定因素。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验