Nam Jin-Wu, Shin Ki-Roo, Han Jinju, Lee Yoontae, Kim V Narry, Zhang Byoung-Tak
Graduate Program in Bioinformatics, Seoul National University Seoul 151-744, Korea.
Nucleic Acids Res. 2005 Jun 24;33(11):3570-81. doi: 10.1093/nar/gki668. Print 2005.
MicroRNAs (miRNAs) are small regulatory RNAs of approximately 22 nt. Although hundreds of miRNAs have been identified through experimental complementary DNA cloning methods and computational efforts, previous approaches could detect only abundantly expressed miRNAs or close homologs of previously identified miRNAs. Here, we introduce a probabilistic co-learning model for miRNA gene finding, ProMiR, which simultaneously considers the structure and sequence of miRNA precursors (pre-miRNAs). On 5-fold cross-validation with 136 referenced human datasets, the efficiency of the classification shows 73% sensitivity and 96% specificity. When applied to genome screening for novel miRNAs on human chromosomes 16, 17, 18 and 19, ProMiR effectively searches distantly homologous patterns over diverse pre-miRNAs, detecting at least 23 novel miRNA gene candidates. Importantly, the miRNA gene candidates do not demonstrate clear sequence similarity to the known miRNA genes. By quantitative PCR followed by RNA interference against Drosha, we experimentally confirmed that 9 of the 23 representative candidate genes express transcripts that are processed by the miRNA biogenesis enzyme Drosha in HeLa cells, indicating that ProMiR may successfully predict miRNA genes with at least 40% accuracy. Our study suggests that the miRNA gene family may be more abundant than previously anticipated, and confer highly extensive regulatory networks on eukaryotic cells.
微小RNA(miRNA)是一类长度约为22个核苷酸的小型调控RNA。尽管通过实验性互补DNA克隆方法和计算手段已鉴定出数百种miRNA,但先前的方法只能检测到高表达的miRNA或先前鉴定的miRNA的紧密同源物。在此,我们引入了一种用于miRNA基因发现的概率协同学习模型ProMiR,该模型同时考虑了miRNA前体(pre-miRNA)的结构和序列。在对136个人类参考数据集进行5折交叉验证时,分类效率显示出73%的灵敏度和96%的特异性。当应用于人类16、17、18和19号染色体上新型miRNA的基因组筛选时,ProMiR能够有效地搜索不同pre-miRNA上的远源同源模式,检测到至少23个新型miRNA基因候选物。重要的是,这些miRNA基因候选物与已知的miRNA基因没有明显的序列相似性。通过定量PCR以及针对Drosha的RNA干扰,我们通过实验证实,23个代表性候选基因中的9个在HeLa细胞中表达可被miRNA生物合成酶Drosha加工的转录本,这表明ProMiR可能以至少40%的准确率成功预测miRNA基因。我们的研究表明,miRNA基因家族可能比先前预期的更为丰富,并在真核细胞中赋予高度广泛的调控网络。