School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, JiangSu 221008, PR China.
BMC Bioinformatics. 2012 Dec 13;13:331. doi: 10.1186/1471-2105-13-331.
Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.
高通量 RNA 测序(RNA-Seq)促进了长非编码 RNA(lncRNAs)的研究。然而,从 RNA-Seq 数据中识别 lncRNAs 仍然不是一件简单的事情,揭示它们的功能仍然是一个挑战。
我们提出了一种从 RNA-Seq 数据中检测新 lncRNAs 的计算流程。首先,使用基因组指导的转录组重建来生成最初组装的转录本。根据定量表达水平过滤可能的部分转录本和伪转录本。然后,使用新开发的名为 lncRScan 的程序,通过进一步过滤已知转录本和那些具有高蛋白编码潜力的转录本来检测新的 lncRNAs。我们将我们的流程应用于一个小鼠 Klf1 敲除数据集,并通过差异表达分析讨论了我们检测到的新 lncRNAs 的可能功能。我们鉴定了 308 个新的 lncRNA 候选物,它们的转录本长度更短,外显子更少,潜在的开放阅读框更短,与已知的蛋白编码转录本相比。在这些 lncRNAs 中,52 个大基因间 ncRNAs(lincRNAs)的表达水平低于蛋白编码基因,13 个 lncRNAs 在野生型和 Klf1 敲除条件之间表现出显著的差异表达。
我们的方法可以从 RNA-Seq 数据中预测一组新的 lncRNAs。一些 lncRNAs 在野生型和 Klf1 敲除品系之间表现出差异表达,这表明这些新的 lncRNAs可以在进一步的功能研究中给予高度重视。