Chang Tsung-Cheng, Pertea Mihaela, Lee Sungyul, Salzberg Steven L, Mendell Joshua T
Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA;
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205, USA;
Genome Res. 2015 Sep;25(9):1401-9. doi: 10.1101/gr.193607.115.
Precise regulation of microRNA (miRNA) expression is critical for diverse physiologic and pathophysiologic processes. Nevertheless, elucidation of the mechanisms through which miRNA expression is regulated has been greatly hindered by the incomplete annotation of primary miRNA (pri-miRNA) transcripts. While a subset of miRNAs are hosted in protein-coding genes, the majority of pri-miRNAs are transcribed as poorly characterized noncoding RNAs that are 10's to 100's of kilobases in length and low in abundance due to efficient processing by the endoribonuclease DROSHA, which initiates miRNA biogenesis. Accordingly, these transcripts are poorly represented in existing RNA-seq data sets and exhibit limited and inaccurate annotation in current transcriptome assemblies. To overcome these challenges, we developed an experimental and computational approach that allows genome-wide detection and mapping of pri-miRNA structures. Deep RNA-seq in cells expressing dominant-negative DROSHA resulted in much greater coverage of pri-miRNA transcripts compared with standard RNA-seq. A computational pipeline was developed that produces highly accurate pri-miRNA assemblies, as confirmed by extensive validation. This approach was applied to a panel of human and mouse cell lines, providing pri-miRNA transcript structures for 1291/1871 human and 888/1181 mouse miRNAs, including 594 human and 425 mouse miRNAs that fall outside protein-coding genes. These new assemblies uncovered unanticipated features and new potential regulatory mechanisms, including links between pri-miRNAs and distant protein-coding genes, alternative pri-miRNA splicing, and transcripts carrying subsets of miRNAs encoded by polycistronic clusters. These results dramatically expand our understanding of the organization of miRNA-encoding genes and provide a valuable resource for the study of mammalian miRNA regulation.
微小RNA(miRNA)表达的精确调控对于多种生理和病理生理过程至关重要。然而,由于初级miRNA(pri-miRNA)转录本注释不完整,miRNA表达调控机制的阐明受到了极大阻碍。虽然一部分miRNA存在于蛋白质编码基因中,但大多数pri-miRNA转录为特征不明的非编码RNA,其长度为数十至数百千碱基,且由于核糖核酸内切酶DROSHA启动miRNA生物合成的高效加工而丰度较低。因此,这些转录本在现有RNA测序数据集中的代表性较差,并且在当前转录组组装中表现出有限且不准确的注释。为了克服这些挑战,我们开发了一种实验和计算方法,能够在全基因组范围内检测和绘制pri-miRNA结构。与标准RNA测序相比,在表达显性负性DROSHA的细胞中进行深度RNA测序可使pri-miRNA转录本的覆盖范围大大增加。我们开发了一种计算流程,该流程可产生高度准确的pri-miRNA组装,广泛验证证实了这一点。该方法应用于一组人类和小鼠细胞系,为1291/1871个人类miRNA和888/1181个小鼠miRNA提供了pri-miRNA转录本结构,其中包括594个人类和425个小鼠miRNA,它们位于蛋白质编码基因之外。这些新的组装揭示了意想不到的特征和新的潜在调控机制,包括pri-miRNA与远处蛋白质编码基因之间的联系、pri-miRNA的可变剪接以及携带多顺反子簇编码的miRNA子集的转录本。这些结果极大地扩展了我们对miRNA编码基因组织的理解,并为哺乳动物miRNA调控研究提供了宝贵资源。