Glicksberg Benjamin Scott, Burns Shohei, Currie Rob, Griffin Ann, Wang Zhen Jane, Haussler David, Goldstein Theodore, Collisson Eric
Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States.
Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
J Med Internet Res. 2020 Mar 20;22(3):e16810. doi: 10.2196/16810.
Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy.
The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer.
We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17).
Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data.
Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.
有效共享标准治疗过程中产生的健康数据能够极大地加速癌症治疗的进展,但存在各种障碍使得这一过程困难重重。不共享这些数据以确保患者隐私,代价是几乎无法从癌症治疗期间产生的真实世界数据中获取经验。此外,最近的研究表明,尽管存在隐私方面的潜在风险,但癌症患者愿意分享他们的治疗经历以推动研究。
本研究的目的是设计、试行并发布一种分散式、可扩展、高效、经济且安全的策略,用于传播去识别化的临床和基因组数据,重点是晚期癌症。
我们创建并试行一个区块链认证系统,以实现对源自标准护理影像、基因组检测和电子健康记录(EHR)的去识别化患者数据的安全共享,该系统称为癌症基因信托(CGT)。我们前瞻性地征得同意并为一个试行队列(N = 18)收集数据,然后将其上传到CGT。EHR数据从医院癌症登记处和通用数据模型(CDM)格式中提取,以确定最佳的数据提取和传播方法。具体而言,我们对两种EHR数据提取格式之间的完整程度进行评分并与有可用数据患者(n = 17)的金标准源文档进行比较。
尽管登记报告的总完整度得分高于CDM,但这种差异无统计学意义。我们确实发现,某些特定数据字段,如组织学部位,使用登记报告能更好地获取,这可用于改进不断调整的CDM。就整个试行研究而言,我们发现CGT能够在更具临床实用性的时间框架内快速整合癌症患者的真实世界数据。我们还开发了一个开源网络应用程序,允许用户无缝搜索、浏览、探索和下载CGT数据。
我们的试行研究表明癌症患者愿意参与数据共享,以及基于区块链的结构如何在保护患者隐私的同时维持各个数据元素之间的关系,使第三方研究人员和临床医生能够利用研究结果。我们证明了CGT作为一个框架来共享孤立的健康数据以促进癌症研究的可行性。需要进一步开展研究以优化数据表示、流程和完整性。