Yang Andrian, Troup Michael, Lin Peijie, Ho Joshua W K
Victor Chang Cardiac Research Institute, Sydney, NSW, Australia.
St. Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia.
Bioinformatics. 2017 Mar 1;33(5):767-769. doi: 10.1093/bioinformatics/btw732.
Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis.
Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/.
Supplementary data are available at Bioinformatics online.
单细胞RNA测序(scRNA-seq)在一系列生物医学研究中使用得越来越多。然而,由于当前RNA测序分析工具的可扩展性有限,它们并非专门为高效处理scRNA-seq数据而设计。在此,我们介绍Falco,这是一个基于云的框架,可利用Apache Hadoop和Apache Spark的大数据技术实现现有RNA测序处理流程的并行化,从而对大规模转录组数据进行大规模并行分析。使用两个公开的scRNA-seq数据集和两个常用的RNA测序比对/特征定量流程,我们发现,与在高度优化的独立计算机上运行相比,使用Falco运行相同的处理流程速度快2.6至145.4倍。Falco还允许用户使用亚马逊网络服务的低成本竞价型实例,使分析成本降低约65%。
Falco可通过GNU通用公共许可证在https://github.com/VCCRI/Falco/获取。
补充数据可在《生物信息学》在线获取。