Xu Guorong, Strong Michael J, Lacey Michelle R, Baribault Carl, Flemington Erik K, Taylor Christopher M
Department of Computer Science, University of New Orleans Lakefront, New Orleans, Louisiana, United States of America.
Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America.
PLoS One. 2014 Feb 25;9(2):e89445. doi: 10.1371/journal.pone.0089445. eCollection 2014.
High-throughput RNA sequencing (RNA-seq) has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS) analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs) generated by the infection of naïve B-cells with the Epstein Barr virus (EBV), while another 23 samples were derived from Burkitt's lymphomas (BL), some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype) from the LCLs (which have a blast-like phenotype) with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely available at http://rnacompass.sourceforge.net/.
高通量RNA测序(RNA-seq)已成为分析生物体转录组多个方面的重要检测方法。此外,还可以使用RNA-seq数据对生物样本的相关微生物组进行分析,并且这种应用在科学界越来越受到关注。有许多现有的生物信息学工具用于转录组数据的分析和可视化。尽管有一系列下一代测序(NGS)分析工具,但对于许多不熟悉命令行工具的生物医学研究人员来说,RNA-seq数据集的分析仍然是一项挑战。在此,我们展示了RNA CoMPASS,这是一个用于同时分析来自不同生物样本的转录组和宏转录组的综合RNA-seq分析流程。RNA CoMPASS利用现有工具和并行计算技术来促进对非常大的数据集的分析。RNA CoMPASS有一个基于网络的图形用户界面,具有内在的排队功能来控制分布式计算流程。通过分析来自45个B细胞样本的RNA-seq数据集对RNA CoMPASS进行了评估。其中22个样本来自幼稚B细胞感染爱泼斯坦-巴尔病毒(EBV)产生的淋巴母细胞系(LCLs),而另外23个样本来自伯基特淋巴瘤(BLs),其中一些部分是由EBV感染引起的。恰当地,RNA CoMPASS在所有LCLs和一部分BLs中鉴定出了EBV。RNA CoMPASS输出的人类转录组成分的聚类分析清楚地将BLs(具有生发中心样表型)与LCLs(具有母细胞样表型)分开,有证据表明BLs中MYC信号激活,干扰素和NF-κB信号较低。总之,该分析说明了RNA CoMPASS在同时分析转录组和宏转录组数据中的实用性。RNA CoMPASS可从http://rnacompass.sourceforge.net/免费获取。