Chen Mei-Ju May, Chen Li-Kai, Lai Yu-Shing, Lin Yu-Yu, Wu Dung-Chi, Tung Yi-An, Liu Kwei-Yan, Shih Hsueh-Tzu, Chen Yi-Jyun, Lin Yan-Liang, Ma Li-Ting, Huang Jian-Long, Wu Po-Chun, Hong Ming-Yi, Chu Fang-Hua, Wu June-Tai, Li Wen-Hsiung, Chen Chien-Yu
Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, 106, Taiwan.
Institute of Molecular Medicine, College of Medicine, National Taiwan University, Taipei, 100, Taiwan.
BMC Genomics. 2016 Mar 11;17:220. doi: 10.1186/s12864-016-2457-0.
Recent advances in sequencing technology have opened a new era in RNA studies. Novel types of RNAs such as long non-coding RNAs (lncRNAs) have been discovered by transcriptomic sequencing and some lncRNAs have been found to play essential roles in biological processes. However, only limited information is available for lncRNAs in Drosophila melanogaster, an important model organism. Therefore, the characterization of lncRNAs and identification of new lncRNAs in D. melanogaster is an important area of research. Moreover, there is an increasing interest in the use of ChIP-seq data (H3K4me3, H3K36me3 and Pol II) to detect signatures of active transcription for reported lncRNAs.
We have developed a computational approach to identify new lncRNAs from two tissue-specific RNA-seq datasets using the poly(A)-enriched and the ribo-zero method, respectively. In our results, we identified 462 novel lncRNA transcripts, which we combined with 4137 previously published lncRNA transcripts into a curated dataset. We then utilized 61 RNA-seq and 32 ChIP-seq datasets to improve the annotation of the curated lncRNAs with regards to transcriptional direction, exon regions, classification, expression in the brain, possession of a poly(A) tail, and presence of conventional chromatin signatures. Furthermore, we used 30 time-course RNA-seq datasets and 32 ChIP-seq datasets to investigate whether the lncRNAs reported by RNA-seq have active transcription signatures. The results showed that more than half of the reported lncRNAs did not have chromatin signatures related to active transcription. To clarify this issue, we conducted RT-qPCR experiments and found that ~95.24% of the selected lncRNAs were truly transcribed, regardless of whether they were associated with active chromatin signatures or not.
In this study, we discovered a large number of novel lncRNAs, which suggests that many remain to be identified in D. melanogaster. For the lncRNAs that are known, we improved their characterization by integrating a large number of sequencing datasets (93 sets in total) from multiple sources (lncRNAs, RNA-seq and ChIP-seq). The RT-qPCR experiments demonstrated that RNA-seq is a reliable platform to discover lncRNAs. This set of curated lncRNAs with improved annotations can serve as an important resource for investigating the function of lncRNAs in D. melanogaster.
测序技术的最新进展开启了RNA研究的新纪元。通过转录组测序发现了新型RNA,如长链非编码RNA(lncRNA),并且发现一些lncRNA在生物过程中发挥着重要作用。然而,对于重要模式生物黑腹果蝇中的lncRNA,可用信息有限。因此,黑腹果蝇中lncRNA的特征描述和新lncRNA的鉴定是一个重要的研究领域。此外,利用ChIP-seq数据(H3K4me3、H3K36me3和Pol II)来检测已报道lncRNA的活跃转录特征的兴趣日益增加。
我们开发了一种计算方法,分别使用富含poly(A)和去除核糖体的方法,从两个组织特异性RNA-seq数据集中鉴定新的lncRNA。在我们的结果中,我们鉴定出462个新的lncRNA转录本,并将其与4137个先前发表的lncRNA转录本合并成一个经过整理的数据集。然后,我们利用61个RNA-seq数据集和32个ChIP-seq数据集,在转录方向、外显子区域、分类、在大脑中的表达、是否拥有poly(A)尾以及是否存在传统染色质特征方面,改进了对经过整理的lncRNA的注释。此外,我们使用30个时间进程RNA-seq数据集和32个ChIP-seq数据集,来研究RNA-seq报道的lncRNA是否具有活跃转录特征。结果表明,超过一半的报道lncRNA没有与活跃转录相关的染色质特征。为了阐明这个问题,我们进行了RT-qPCR实验,发现约95.24%的所选lncRNA确实被转录,无论它们是否与活跃染色质特征相关。
在本研究中,我们发现了大量新的lncRNA,这表明在黑腹果蝇中仍有许多lncRNA有待鉴定。对于已知的lncRNA,我们通过整合来自多个来源(lncRNA、RNA-seq和ChIP-seq)的大量测序数据集(总共93个数据集),改进了它们的特征描述。RT-qPCR实验表明,RNA-seq是发现lncRNA的可靠平台。这组经过改进注释的经过整理的lncRNA可作为研究黑腹果蝇中lncRNA功能的重要资源。