Department of Agronomy and Horticulture, Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China.
Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
Mol Plant. 2017 Jul 5;10(7):990-999. doi: 10.1016/j.molp.2017.05.010. Epub 2017 Jun 6.
One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.
一种识别基因组中非编码调控区域的方法是量化相关物种之间的分化速率,因为功能序列通常会分化得更慢。大多数基于比对来识别这些保守非编码序列(CNS)的方法与已知转录因子结合位点的平均长度相比,具有相对较大的最小序列长度(≥15 bp)。为了规避这一限制,开发了一种可以同时整合三个或更多物种保守直系同源基因启动子数据的 STAG-CNS。使用多达六种禾本科植物的数据,有可能鉴定出长度短至 9 bp 的保守序列,假发现率≤0.05。这些 CNS 与使用 DNase I 超敏反应测定法鉴定的开放染色质区域重叠更多,并且在参与转录调控的基因启动子中富集。STAG-CNS 进一步用于表征与保留的古代玉米多倍体基因相关的保守非编码序列的丢失。具有较少保留 CNS 的基因的整体表达水平较低,尽管在包含许多细胞类型的复杂器官系统样本中,这种偏差更为明显,这表明 CNS 的丢失可能对应于表达环境的数量减少,而不是整个祖先表达域的表达水平降低。