Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Bioinformatics. 2020 Feb 15;36(4):1305-1306. doi: 10.1093/bioinformatics/btz680.
Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP.
dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x.
Supplementary data are available at Bioinformatics online.
基于 2007 年 8 月发布的基因组数据共享政策,美国国立卫生研究院(NIH)支持了多个存储库,如基因型和表型数据库(dbGaP)。dbGaP 是一个在线存储库,提供对超过 1000 项研究的大规模遗传和表型数据集的访问。然而,浏览网站和理解研究之间的关系并不容易。此外,文件的解密是一个复杂的过程。在这项研究中,我们提出了 dbgap2x R 包,它涵盖了广泛的功能,用于搜索 dbGaP 研究,探索研究的特征,并轻松从 dbGaP 解密文件。
dbgap2x 是一个 R 包,代码可在 https://github.com/gversmee/dbgap2x 获得。一个包含该包、一个 Jupyter 服务器和一个 Notebook 示例的容器化版本可在 https://hub.docker.com/r/gversmee/dbgap2x 获得。
补充数据可在生物信息学在线获得。