Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA.
Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, 85259, USA.
Sci Rep. 2017 Oct 27;7(1):14196. doi: 10.1038/s41598-017-14595-3.
Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data. Herein we describe UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR's accuracy and their relevance to sample clinical phenotypes. UClncR would facilitate researchers' novel lncRNA discovery significantly and is publically available at http://bioinformaticstools.mayo.edu/research/UClncR .
长非编码 RNA(lncRNA)是近年来发现的具有调控功能的一类大型基因转录本。随着来自不同类型正常和患病组织的 RNA-seq 数据的积累,预计会有更多的 lncRNA 被揭示。然而,从大量的 RNA-seq 数据中发现新的 lncRNA 并准确量化已知的 lncRNA 并非易事。在此,我们描述了 UClncR,这是一种用于应对这一挑战的超快速和全面的 lncRNA 检测流程。UClncR 采用标准的 RNA-seq 比对文件,进行转录本组装,预测 lncRNA 候选物,对已知和新的 lncRNA 候选物进行定量和注释,并生成便于下游分析的报告。该流程同时适用于无链和有链 RNA-seq,以便预测和定量与其他基因重叠的 lncRNA。UClncR 在集群环境中完全并行化,但允许用户在没有集群的情况下顺序运行样本。该流程可以在几分钟内处理一个典型的 RNA-seq 样本,并在数小时内完成数百个样本的处理。对来自两个测试数据集的预测 lncRNA 的分析表明,UClncR 具有准确性,并且与样本的临床表型相关。UClncR 将极大地促进研究人员对新的 lncRNA 的发现,并在 http://bioinformaticstools.mayo.edu/research/UClncR 上公开提供。