Ng Kwang Loong Stanley, Mishra Santosh K
Bioinformatics Institute, Matrix, Singapore.
RNA. 2007 Feb;13(2):170-87. doi: 10.1261/rna.223807. Epub 2006 Dec 28.
MicroRNAs (miRNAs) participate in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Hairpin is a crucial structural feature for the computational identification of precursor miRNAs (pre-miRs), as its formation is critically associated with the early stages of the mature miRNA biogenesis. Our incomplete knowledge about the number of miRNAs present in the genomes of vertebrates, worms, plants, and even viruses necessitates thorough understanding of their sequence motifs, hairpin structural characteristics, and topological descriptors. In this in-depth study, we investigate a comprehensive and heterogeneous collection of 2241 published (nonredundant) pre-miRs across 41 species (miRBase 8.2), 8494 pseudohairpins extracted from the human RefSeq genes, 12,387 (nonredundant) ncRNAs spanning 457 types (Rfam 7.0), 31 full-length mRNAs randomly selected from GenBank, and four sets of synthetically generated genomic background corresponding to each of the native RNA sequence. Our large-scale characterization analysis reveals that pre-miRs are significantly different from other types of ncRNAs, pseudohairpins, mRNAs, and genomic background according to the nonparametric Kruskal-Wallis ANOVA (p<0.001). We examine the intrinsic and global features at the sequence, structural, and topological levels including %G+C content, normalized base-pairing propensity P(S), normalized minimum free energy of folding MFE(s), normalized Shannon entropy Q(s), normalized base-pair distance D(s), and degree of compactness F(S), as well as their corresponding Z scores of P(S), MFE(s), Q(s), D(s), and F(S). The findings will promote more accurate guidelines and distinctive criteria for the prediction of novel pre-miRs with improved performance.
微小RNA(miRNA)通过转录后基因调控途径参与多种细胞和生理过程。发夹结构是前体miRNA(pre-miR)计算识别的关键结构特征,因为其形成与成熟miRNA生物合成的早期阶段密切相关。我们对脊椎动物、蠕虫、植物甚至病毒基因组中存在的miRNA数量了解不全面,因此有必要深入了解它们的序列基序、发夹结构特征和拓扑描述符。在这项深入研究中,我们调查了一个综合且异质的数据集,包括来自41个物种(miRBase 8.2)的2241个已发表(非冗余)的pre-miR、从人类RefSeq基因中提取的8494个假发夹、涵盖457种类型(Rfam 7.0)的12387个(非冗余)非编码RNA、从GenBank中随机选择的3个全长mRNA,以及与每个天然RNA序列对应的四组合成基因组背景。我们的大规模特征分析表明,根据非参数Kruskal-Wallis方差分析(p<0.001),pre-miR与其他类型的非编码RNA、假发夹、mRNA和基因组背景有显著差异。我们在序列、结构和拓扑水平上检查了内在和全局特征,包括%G+C含量、标准化碱基配对倾向P(S)、标准化最小折叠自由能MFE(s)、标准化香农熵Q(s)、标准化碱基对距离D(s)和紧凑度F(S),以及它们相应的P(S)、MFE(s)、Q(s)、D(s)和F(S)的Z分数。这些发现将为预测具有更高性能的新型pre-miR提供更准确的指导方针和独特标准。