Department of Animal Science, University of California, Davis, California 95616, USA.
Genome Res. 2021 Apr;31(4):732-744. doi: 10.1101/gr.267336.120. Epub 2021 Mar 15.
Characterizing transcription start sites is essential for understanding the regulatory mechanisms that control gene expression. Recently, a new bovine genome assembly (ARS-UCD1.2) with high continuity, accuracy, and completeness was released; however, the functional annotation of the bovine genome lacks precise transcription start sites and contains a low number of transcripts in comparison to human and mouse. By using the RAMPAGE approach, this study identified transcription start sites at high resolution in a large collection of bovine tissues. We found several known and novel transcription start sites attributed to promoters of protein-coding and lncRNA genes that were validated through experimental and in silico evidence. With these findings, the annotation of transcription start sites in cattle reached a level comparable to the mouse and human genome annotations. In addition, we identified and characterized transcription start sites for antisense transcripts derived from bidirectional promoters, potential lncRNAs, mRNAs, and pre-miRNAs. We also analyzed the quantitative aspects of RAMPAGE to produce a promoter activity atlas, reaching highly reproducible results comparable to traditional RNA-seq. Coexpression networks revealed considerable use of tissue-specific promoters, especially between brain and testicle, which expressed several genes in common from alternate loci. Furthermore, regions surrounding coexpressed modules were enriched in binding factor motifs representative of each tissue. The comprehensive annotation of promoters in such a large collection of tissues will substantially contribute to our understanding of gene expression in cattle and other mammalian species, shortening the gap between genotypes and phenotypes.
鉴定转录起始位点对于理解控制基因表达的调控机制至关重要。最近,发布了一个具有高连续性、准确性和完整性的新牛基因组组装(ARS-UCD1.2);然而,与人类和小鼠相比,牛基因组的功能注释缺乏精确的转录起始位点,并且转录本数量较少。本研究使用 RAMPAGE 方法,在大量牛组织中以高分辨率鉴定转录起始位点。我们发现了几个已知和新的转录起始位点,这些转录起始位点归因于蛋白质编码基因和 lncRNA 基因的启动子,这些启动子通过实验和计算机证据得到了验证。有了这些发现,牛的转录起始位点注释达到了与小鼠和人类基因组注释相当的水平。此外,我们还鉴定和表征了来自双向启动子的反义转录本、潜在的 lncRNA、mRNA 和 pre-miRNA 的转录起始位点。我们还分析了 RAMPAGE 的定量方面,生成了一个启动子活性图谱,其结果具有高度可重复性,可与传统的 RNA-seq 相媲美。共表达网络揭示了组织特异性启动子的大量使用,特别是在大脑和睾丸之间,它们从不同的基因座表达了许多共同的基因。此外,共表达模块周围的区域富含代表每种组织的结合因子基序。在如此大量的组织中对启动子进行全面注释将极大地促进我们对牛和其他哺乳动物物种基因表达的理解,缩短基因型和表型之间的差距。