Saini Harpreet K, Enright Anton J, Griffiths-Jones Sam
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
BMC Genomics. 2008 Nov 27;9:564. doi: 10.1186/1471-2164-9-564.
MicroRNAs (miRNAs) are important regulators of gene expression and have been implicated in development, differentiation and pathogenesis. Hundreds of miRNAs have been discovered in mammalian genomes. Approximately 50% of mammalian miRNAs are expressed from introns of protein-coding genes; the primary transcript (pri-miRNA) is therefore assumed to be the host transcript. However, very little is known about the structure of pri-miRNAs expressed from intergenic regions. Here we annotate transcript boundaries of miRNAs in human, mouse and rat genomes using various transcription features. The 5' end of the pri-miRNA is predicted from transcription start sites, CpG islands and 5' CAGE tags mapped in the upstream flanking region surrounding the precursor miRNA (pre-miRNA). The 3' end of the pri-miRNA is predicted based on the mapping of polyA signals, and supported by cDNA/EST and ditags data. The predicted pri-miRNAs are also analyzed for promoter and insulator-associated regulatory regions.
We define sets of conserved and non-conserved human, mouse and rat pre-miRNAs using bidirectional BLAST and synteny analysis. Transcription features in their flanking regions are used to demarcate the 5' and 3' boundaries of the pri-miRNAs. The lengths and boundaries of primary transcripts are highly conserved between orthologous miRNAs. A significant fraction of pri-miRNAs have lengths between 1 and 10 kb, with very few introns. We annotate a total of 59 pri-miRNA structures, which include 82 pre-miRNAs. 36 pri-miRNAs are conserved in all 3 species. In total, 18 of the confidently annotated transcripts express more than one pre-miRNA. The upstream regions of 54% of the predicted pri-miRNAs are found to be associated with promoter and insulator regulatory sequences.
Little is known about the primary transcripts of intergenic miRNAs. Using comparative data, we are able to identify the boundaries of a significant proportion of human, mouse and rat pri-miRNAs. We confidently predict the transcripts including a total of 77, 58 and 47 human, mouse and rat pre-miRNAs respectively. Our computational annotations provide a basis for subsequent experimental validation of predicted pri-miRNAs.
微小RNA(miRNA)是基因表达的重要调节因子,与发育、分化和发病机制有关。在哺乳动物基因组中已发现数百种miRNA。大约50%的哺乳动物miRNA是从蛋白质编码基因的内含子中表达的;因此,初级转录本(pri-miRNA)被认为是宿主转录本。然而,对于从基因间区域表达的pri-miRNA的结构知之甚少。在这里,我们使用各种转录特征注释人类、小鼠和大鼠基因组中miRNA的转录本边界。pri-miRNA的5'端是根据转录起始位点、CpG岛和映射在前体miRNA(pre-miRNA)上游侧翼区域的5' CAGE标签预测的。pri-miRNA的3'端是根据多聚腺苷酸信号的映射预测的,并得到cDNA/EST和双标签数据的支持。还对预测的pri-miRNA进行了启动子和绝缘子相关调控区域的分析。
我们使用双向BLAST和共线性分析定义了保守和非保守的人类、小鼠和大鼠pre-miRNA集。其侧翼区域的转录特征用于划定pri-miRNA的5'和3'边界。直系同源miRNA之间初级转录本的长度和边界高度保守。相当一部分pri-miRNA的长度在1至10 kb之间,内含子很少。我们总共注释了59个pri-miRNA结构,其中包括82个pre-miRNA。36个pri-miRNA在所有三个物种中都是保守的。总共,18个经过可靠注释的转录本表达不止一个pre-miRNA。发现54%的预测pri-miRNA的上游区域与启动子和绝缘子调控序列相关。
对于基因间miRNA的初级转录本知之甚少。利用比较数据,我们能够确定相当一部分人类、小鼠和大鼠pri-miRNA的边界。我们可靠地预测了分别包含总共77个、58个和