Institut Germans Trias i Pujol (IGTP), Badalona, Spain.
BMC Bioinformatics. 2021 Apr 26;22(1):215. doi: 10.1186/s12859-021-04128-1.
Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomiR variation reported so far could be due to systematic technical noise and not real.
We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomiR level. We exploited its ability to use single or merged reads to compare isomiR results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomiRs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomiRs, and at a much smaller frequency terminal length changing isomiRs. This is relevant for the identification of true isomiRs in small RNA sequencing datasets.
We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomiRnome should take this into account.
下一代测序技术允许发现 miRNA 同型物,称为 isomiRs。一些 isomiRs 是由 miRNA 前体的不精确加工产生的,导致长度变异。通过在末端非模板添加碱基或内部碱基编辑引入额外的可变性,导致与模板 DNA 序列相比存在碱基差异。我们假设到目前为止报告的一些 isomiR 变异可能是由于系统技术噪声而不是真实的。
我们开发了 XICRA 管道来在 isomiR 水平上分析小 RNA 测序数据。我们利用其使用单个或合并读取的能力,将来自配对末端 (PE) 读取的 isomiR 结果与来自单读取 (SR) 的结果进行比较,以确定相对于在 isomiRs 中发现的典型 miRNA 的可检测序列差异是否是真实的生物学变异还是测序错误的结果。我们已经检测到 SR 和 PE 数据之间存在不可忽视的系统差异,这些差异主要影响假定的内部编辑 isomiRs,并且在较小的频率下影响末端长度变化的 isomiRs。这对于在小 RNA 测序数据集中鉴定真实的 isomiRs 很重要。
我们得出结论,源自测序错误和/或数据处理的潜在伪像可能导致 miRNA 同型物丰度和多样性的高估。注释 isomiRnome 时应该考虑到这一点。