Nellore Abhinav, Wilks Christopher, Hansen Kasper D, Leek Jeffrey T, Langmead Ben
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
Bioinformatics. 2016 Aug 15;32(16):2551-3. doi: 10.1093/bioinformatics/btw177. Epub 2016 Apr 21.
Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.
We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.
Rail-RNA is available from http://rail.bio Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/
: anellore@gmail.com or langmea@cs.jhu.edu
Supplementary data are available at Bioinformatics online.
公共档案库包含数万亿碱基的宝贵测序数据。序列读取档案库中超过40%是受dbGaP等规定保护的人类数据。为了分析受dbGaP保护的数据,研究人员通常必须与IT管理员和签署官员合作,以确保其机构实施所有级别的安全措施。这是一个主要障碍,阻碍了可重复性并降低了存档数据的效用。
我们提出了一种用于在商业云中分析受保护数据的协议和软件工具。该协议Rail-dbGaP适用于在亚马逊网络服务弹性MapReduce上运行的任何工具。工具Rail-RNA v0.2是一种用于RNA测序数据的剪接比对器,我们通过在来自受dbGaP保护的GTEx联盟数据集的9662个样本上运行来进行演示。Rail-dbGaP协议首次明确了研究人员为开发以符合美国国立卫生研究院指南的方式分析受dbGaP保护数据的弹性MapReduce管道必须采取的步骤。Rail-RNA使该协议的实施自动化,使典型的生物医学研究人员能够轻松研究受保护的RNA测序数据,而无需考虑其本地IT资源或专业知识。
anellore@gmail.com或langmea@cs.jhu.edu
补充数据可在《生物信息学》在线版上获取。