Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA.
Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, USA.
BMC Genomics. 2020 Feb 12;21(1):153. doi: 10.1186/s12864-019-6432-4.
Long noncoding RNAs (lncRNAs) have roles in gene regulation, epigenetics, and molecular scaffolding and it is hypothesized that they underlie some mammalian evolutionary adaptations. However, for many mammalian species, the absence of a genome assembly precludes the comprehensive identification of lncRNAs. The genome of the American beaver (Castor canadensis) has recently been sequenced, setting the stage for the systematic identification of beaver lncRNAs and the characterization of their expression in various tissues. The objective of this study was to discover and profile polyadenylated lncRNAs in the beaver using high-throughput short-read sequencing of RNA from sixteen beaver tissues and to annotate the resulting lncRNAs based on their potential for orthology with known lncRNAs in other species.
Using de novo transcriptome assembly, we found 9528 potential lncRNA contigs and 187 high-confidence lncRNA contigs. Of the high-confidence lncRNA contigs, 147 have no known orthologs (and thus are putative novel lncRNAs) and 40 have mammalian orthologs. The novel lncRNAs mapped to the Oregon State University (OSU) reference beaver genome with greater than 90% sequence identity. While the novel lncRNAs were on average shorter than their annotated counterparts, they were similar to the annotated lncRNAs in terms of the relationships between contig length and minimum free energy (MFE) and between coverage and contig length. We identified beaver orthologs of known lncRNAs such as XIST, MEG3, TINCR, and NIPBL-DT. We profiled the expression of the 187 high-confidence lncRNAs across 16 beaver tissues (whole blood, brain, lung, liver, heart, stomach, intestine, skeletal muscle, kidney, spleen, ovary, placenta, castor gland, tail, toe-webbing, and tongue) and identified both tissue-specific and ubiquitous lncRNAs.
To our knowledge this is the first report of systematic identification of lncRNAs and their expression atlas in beaver. LncRNAs-both novel and those with known orthologs-are expressed in each of the beaver tissues that we analyzed. For some beaver lncRNAs with known orthologs, the tissue-specific expression patterns were phylogenetically conserved. The lncRNA sequence data files and raw sequence files are available via the web supplement and the NCBI Sequence Read Archive, respectively.
长非编码 RNA(lncRNA)在基因调控、表观遗传学和分子支架中发挥作用,据推测它们是哺乳动物某些进化适应的基础。然而,对于许多哺乳动物物种来说,由于缺乏基因组组装,因此无法全面识别 lncRNA。美洲海狸(Castor canadensis)的基因组最近已经测序,为系统地鉴定海狸 lncRNA 及其在各种组织中的表达特征奠定了基础。本研究的目的是利用来自 16 种海狸组织的 RNA 的高通量短读测序来发现和描绘海狸中的多聚腺苷酸化 lncRNA,并根据它们与其他物种中已知 lncRNA 的同源性来注释由此产生的 lncRNA。
通过从头转录组组装,我们发现了 9528 个潜在的 lncRNA 连段和 187 个高可信度 lncRNA 连段。在高可信度 lncRNA 连段中,有 147 个没有已知的同源物(因此是假定的新型 lncRNA),有 40 个有哺乳动物同源物。新型 lncRNA 与俄勒冈州立大学(OSU)参考海狸基因组的同源性大于 90%序列同一性。虽然新型 lncRNA 比它们的注释对应物短,但它们在连段长度与最小自由能(MFE)之间的关系以及覆盖率与连段长度之间的关系方面与注释 lncRNA 相似。我们鉴定了已知 lncRNA 的海狸同源物,如 XIST、MEG3、TINCR 和 NIPBL-DT。我们在 16 种海狸组织(全血、脑、肺、肝、心、胃、肠、骨骼肌、肾、脾、卵巢、胎盘、海狸腺、尾、趾蹼、舌)中对 187 个高可信度 lncRNA 的表达进行了分析,鉴定了组织特异性和普遍表达的 lncRNA。
据我们所知,这是首次系统地鉴定海狸中的 lncRNA 及其表达图谱的报道。lncRNA-无论是新型的还是具有已知同源物的,都在我们分析的海狸组织中表达。对于一些具有已知同源物的海狸 lncRNA,其组织特异性表达模式在系统发育上是保守的。lncRNA 序列数据文件和原始序列文件分别可通过网络补充和 NCBI 序列读取档案获得。