The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, Midlothian, EH25 9RG, UK.
Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Headington, Oxford, OX3 9DU, UK.
Genet Sel Evol. 2018 Apr 24;50(1):20. doi: 10.1186/s12711-018-0391-0.
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci.
Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes.
Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
mRNA 样长非编码 RNA(lncRNA)是哺乳动物转录组的重要组成部分,尽管大多数 lncRNA 的表达水平较低,具有组织特异性和/或在特定发育阶段表达。因此,在许多情况下,RNA 测序(RNA-seq)检测 lncRNA 受到随机采样的影响。为了解决这个问题并创建反刍动物 lncRNA 目录,我们将从绵羊和山羊转录图谱项目的大型 RNA-seq 数据集从头组装的 lncRNA 与先前在牛和人中组装的 lncRNA 进行了比较。然后,我们将新的 lncRNA 与绵羊转录图谱相结合,以鉴定受调控的蛋白质编码和非编码基因座的集合。
即使对来自多个动物的相同组织进行深度测序,也很少有 lncRNA 可以从单个数据集重复组装。此外,从汇集的 RNA-seq 数据组装的 lncRNA 之间几乎没有序列重叠。我们将位置保守性(同线性)与候选 lncRNA 的跨物种映射相结合,以鉴定反刍动物 lncRNA 的共识集,然后使用 RNA-seq 数据来证明每个物种中都可检测到和可重复表达。在绵羊中,20%到 30%的 lncRNA 位于与它们强烈共表达的蛋白质编码基因附近,这与一些 ncRNA 在增强子序列中的进化起源一致。然而,大多数 lncRNA 与邻近的蛋白质编码基因没有共表达。
除了大大扩展反刍动物 lncRNA 库外,我们分析的结果还表明,通过组合来自相关物种的 RNA-seq 数据集,可以部分克服随机采样的问题。这对未来在其他物种中发现 lncRNA 具有实际意义。