Guo Li, Liang Tingming, Lu Zuhong
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
Biosystems. 2011 May-Jun;104(2-3):87-93. doi: 10.1016/j.biosystems.2011.01.004. Epub 2011 Jan 13.
High-throughput sequencing is a powerful tool for discovering and profiling microRNAs (miRNAs) to gain further insights into their biogenesis and function. Due to shorter size, short RNAs from deep sequencing dataset are prone to map to multiple loci with an equal number of mismatches, especially among multicopy miRNA precursors and homologous miRNA genes. Systematic analysis of SOLiD sequencing dataset showed that 37.94% short RNAs could simultaneously map to more than one miRNA precursor, and more short RNAs were found to have multiple genomic loci. Improper selection from candidate loci might lose some mapping information, influence miRNA expression profile or even mislead to identify novel miRNAs. A comprehensive study indicated several potential features for correction strategy: location and distribution of mismatches, quality values, expression profiles of multiple isomiRs (miRNA variants), miRNA* and moRs (miRNA-offset-RNAs) at candidate locus and in its flank sequence. Further studies should develop an approach to correct the widespread phenomenon of multiple mapping based on these features, and improve accuracy of profiling and discovering miRNAs.
高通量测序是一种强大的工具,可用于发现和分析微小RNA(miRNA),从而更深入地了解其生物发生和功能。由于深度测序数据集中的短RNA长度较短,它们容易以相同数量的错配映射到多个位点,尤其是在多拷贝miRNA前体和同源miRNA基因之间。对SOLiD测序数据集的系统分析表明,37.94%的短RNA可以同时映射到多个miRNA前体,并且发现更多的短RNA具有多个基因组位点。从候选位点进行不当选择可能会丢失一些映射信息,影响miRNA表达谱,甚至误导新型miRNA的鉴定。一项综合研究指出了校正策略的几个潜在特征:候选位点及其侧翼序列中错配的位置和分布、质量值、多个异源miRNA(miRNA变体)、miRNA*和moRs(miRNA偏移RNA)的表达谱。进一步的研究应基于这些特征开发一种方法来纠正普遍存在的多重映射现象,并提高miRNA分析和发现的准确性。