Suppr超能文献

自动化保守非编码序列(CNS)发现揭示了禾本科植物中基因组成和启动子进化的差异。

Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses.

机构信息

Department of Plant and Microbial Biology, University of California Berkeley, CA, USA.

出版信息

Front Plant Sci. 2013 Jul 2;4:170. doi: 10.3389/fpls.2013.00170. eCollection 2013.

Abstract

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.

摘要

保守的非编码序列 (CNS) 是一些非编码序列的岛屿,与无功能的 DNA 相比,这些序列在相关物种之间的序列上的差异较小。已经有实验证明了几个 CNS 作为顺式调控区域发挥作用。然而,大多数 CNS 的具体功能仍然未知。以前在植物中寻找 CNS 的方法要么是基于外显子,只能识别附近的序列,要么需要多年的艰苦手动注释。在这里,我们提供了一个开源工具,可以在具有测序基因组的任何两个相关物种之间准确识别 CNS,包括那些紧邻外显子的 CNS 和通过 12kb 以上非编码序列分隔的远端序列。我们已经使用该工具来描述新的基序,将 CNS 与其他功能相关联,并在五种禾本科植物的基因组中识别以前未检测到的编码 RNA 和蛋白质的基因。我们提供了一个在所有测试的禾本科植物中保守的 15363 个直系同源 CNS 列表。我们还能够鉴定出在禾本科植物的共同祖先中存在的调控序列,但在一个或多个现存的禾本科植物谱系中已经丢失。还提供了拟南芥、粳稻、谷子、高粱、短柄草和玉米的同源基因对和相关 CNS 的列表,以供参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4158/3708275/1a43653b4fb0/fpls-04-00170-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验