Department of Computational and Quantitative Medicine, City of Hope, Duarte, California.
Sherlock, San Diego Supercomputer Center, University of California, San Diego, San Diego, California.
Cancer Epidemiol Biomarkers Prev. 2020 Apr;29(4):777-786. doi: 10.1158/1055-9965.EPI-19-0842. Epub 2020 Feb 12.
Large-scale cancer epidemiology cohorts (CEC) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more findable, accessible, interoperable, and reusable, or FAIR. How CECs should approach this transformation is unclear.
The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995-1996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a data warehouse to integrate and manage data and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses.
Our new CTS infrastructure includes a data warehouse and data marts, which are focused subsets from the data warehouse designed for efficiency. The secure CTS workspace utilizes a remote desktop service that operates within a Health Insurance Portability and Accountability Act (HIPAA)- and Federal Information Security Management Act (FISMA)-compliant platform. Our infrastructure offers broad access to CTS data, includes statistical analysis and data visualization software and tools, flexibly manages other key data activities (e.g., cleaning, updates, and data sharing), and will continue to evolve to advance FAIR principles.
Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCI's Cancer Research Data Commons Framework.
The CTS's implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.
大型癌症流行病学队列(CEC)多年来成功地收集、分析和共享患者报告的数据。CEC 越来越需要使他们的数据更易于查找、访问、互操作和重用,即 FAIR。CEC 应该如何进行这种转变尚不清楚。
加利福尼亚教师研究(CTS)是一项观察性 CEC,自 1995-1996 年以来一直对 133477 名参与者进行跟踪研究。2014 年,我们开始更新我们的数据存储、管理、分析和共享策略。我们与圣地亚哥超级计算机中心合作,部署了一个新的基础设施,该基础设施基于数据仓库,用于整合和管理数据,以及一个安全且共享的工作空间,其中包含文档、软件和分析工具,以促进协作并加速分析。
我们的新 CTS 基础设施包括一个数据仓库和数据集市,这是数据仓库的专注子集,旨在提高效率。安全的 CTS 工作空间利用远程桌面服务,该服务在符合 HIPAA 和 FISMA 要求的平台内运行。我们的基础设施提供了对 CTS 数据的广泛访问,包括统计分析和数据可视化软件和工具,灵活地管理其他关键数据活动(例如,清理、更新和数据共享),并将继续发展以推进 FAIR 原则。
我们的可扩展基础设施提供了安全、授权、数据模型、元数据和分析工具,可用于以符合 NCI 癌症研究数据共享框架的方式管理、共享和分析 CTS 数据。
CTS 在持续进行的 CEC 中实施新的基础设施,展示了人口科学如何探索和采用新的基于云的和分析基础设施,以加速癌症研究和转化。