Wilson Shane, Fitzsimons Michael, Ferguson Martin, Heath Allison, Jensen Mark, Miller Josh, Murphy Mark W, Porter James, Sahni Himanso, Staudt Louis, Tang Yajing, Wang Zhining, Yu Christine, Zhang Junjun, Ferretti Vincent, Grossman Robert L
Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
Center for Data Intensive Science (CDIS), University of Chicago, Chicago, Illinois.
Cancer Res. 2017 Nov 1;77(21):e15-e18. doi: 10.1158/0008-5472.CAN-17-0598.
The NCI Genomic Data Commons (GDC) was launched in 2016 and makes available over 4 petabytes (PB) of cancer genomic and associated clinical data to the research community. This dataset continues to grow and currently includes over 14,500 patients. The GDC is an example of a biomedical data commons, which collocates biomedical data with storage and computing infrastructure and commonly used web services, software applications, and tools to create a secure, interoperable, and extensible resource for researchers. The GDC is (i) a data repository for downloading data that have been submitted to it, and also a system that (ii) applies a common set of bioinformatics pipelines to submitted data; (iii) reanalyzes existing data when new pipelines are developed; and (iv) allows users to build their own applications and systems that interoperate with the GDC using the GDC Application Programming Interface (API). We describe the GDC API and how it has been used both by the GDC itself and by third parties. .
美国国立癌症研究所基因组数据共享库(GDC)于2016年启动,为研究界提供了超过4拍字节(PB)的癌症基因组及相关临床数据。该数据集持续增长,目前包含超过14500名患者的数据。GDC是生物医学数据共享库的一个范例,它将生物医学数据与存储和计算基础设施以及常用的网络服务、软件应用程序和工具配置在一起,为研究人员创建了一个安全、可互操作且可扩展的资源。GDC既是(i)一个用于下载已提交数据的数据存储库,也是一个(ii)对提交数据应用一套通用生物信息学管道的系统;(iii)在开发新管道时重新分析现有数据;以及(iv)允许用户使用GDC应用程序编程接口(API)构建与GDC互操作的自己的应用程序和系统。我们描述了GDC API及其在GDC自身和第三方中的使用方式。