Wagner Erin K, Raje Satyajeet, Amos Liz, Kurata Jessica, Badve Abhijit S, Li Yingquan, Busby Ben
BioStat Solutions, Frederick, USA.
National Library of Medicine, National Institutes of Health, Bethesda, USA.
F1000Res. 2017 Mar 24;6:319. doi: 10.12688/f1000research.9837.1. eCollection 2017.
Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.
数据共享对于推进基因组研究至关重要,它可以通过重新利用和合并现有数据来减少收集新数据的需求,并促进可重复研究。癌症基因组图谱(TCGA)是个体水平基因型-表型癌症相关数据的常用资源。基因型和表型数据库(dbGaP)包含许多与TCGA中的数据集相似的数据集。我们创建了一个软件管道,使研究人员能够基于匹配的TCGA元数据从dbGaP中发现相关的基因组数据。由此产生的研究提供了一个易于使用的工具来连接这两个数据源。