Zhao Shanrong, Gordon William, Du Sarah, Zhang Chi, He Wen, Xi Li, Mathur Sachin, Agostino Michael, Paradis Theresa, von Schack David, Vincent Michael, Zhang Baohong
Early Clinical Development, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
Business Technology, Pfizer Worldwide Research and Development, Andover, MA, 01810, USA.
BMC Bioinformatics. 2017 Mar 20;18(1):180. doi: 10.1186/s12859-017-1601-4.
Genome-wide miRNA expression data can be used to study miRNA dysregulation comprehensively. Although many open-source tools for microRNA (miRNA)-seq data analyses are available, challenges remain in accurate miRNA quantification from large-scale miRNA-seq dataset. We implemented a pipeline called QuickMIRSeq for accurate quantification of known miRNAs and miRNA isoforms (isomiRs) from multiple samples simultaneously.
QuickMIRSeq considers the unique nature of miRNAs and combines many important features into its implementation. First, it takes advantage of high redundancy of miRNA reads and introduces joint mapping of multiple samples to reduce computational time. Second, it incorporates the strand information in the alignment step for more accurate quantification. Third, reads potentially arising from background noise are filtered out to improve the reliability of miRNA detection. Fourth, sequences aligned to miRNAs with mismatches are remapped to a reference genome to further reduce false positives. Finally, QuickMIRSeq generates a rich set of QC metrics and publication-ready plots.
The rich visualization features implemented allow end users to interactively explore the results and gain more insights into miRNA-seq data analyses. The high degree of automation and interactivity in QuickMIRSeq leads to a substantial reduction in the time and effort required for miRNA-seq data analysis.
全基因组miRNA表达数据可用于全面研究miRNA失调。尽管有许多用于微小RNA(miRNA)-seq数据分析的开源工具,但从大规模miRNA-seq数据集中进行准确的miRNA定量仍存在挑战。我们实现了一个名为QuickMIRSeq的流程,用于同时从多个样本中准确量化已知miRNA和miRNA异构体(isomiRs)。
QuickMIRSeq考虑了miRNA的独特性质,并在其实现中结合了许多重要特征。首先,它利用了miRNA读数的高冗余性,并引入多个样本的联合比对以减少计算时间。其次,它在比对步骤中纳入链信息以进行更准确的定量。第三,过滤掉可能由背景噪声产生的读数,以提高miRNA检测的可靠性。第四,将与miRNA错配比对的序列重新比对到参考基因组,以进一步减少假阳性。最后,QuickMIRSeq生成了丰富的质量控制指标集和可供发表的图表。
所实现的丰富可视化功能使最终用户能够交互式地探索结果,并对miRNA-seq数据分析有更多的见解。QuickMIRSeq的高度自动化和交互性大大减少了miRNA-seq数据分析所需的时间和精力。