Suppr超能文献

2 万多名肺癌患者的分布式学习 - 个人健康训练。

Distributed learning on 20 000+ lung cancer patients - The Personal Health Train.

机构信息

Department of Radiation Oncology (MAASTRO), GROW - School for Oncology and Developmental Biology, Maastricht University Medical Centre+, The Netherlands; The D-Lab: Dpt of Precision Medicine, GROW - School for Oncology and Developmental Biology, Maastricht University Medical Centre+, The Netherlands.

Department of Radiation Oncology (MAASTRO), GROW - School for Oncology and Developmental Biology, Maastricht University Medical Centre+, The Netherlands; Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, The Netherlands.

出版信息

Radiother Oncol. 2020 Mar;144:189-200. doi: 10.1016/j.radonc.2019.11.019. Epub 2020 Jan 3.

Abstract

BACKGROUND AND PURPOSE

Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute.

MATERIALS AND METHODS

Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots.

RESULTS

In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015.

CONCLUSION

The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy.

摘要

背景与目的

获取医疗保健数据对于科学进步和创新至关重要。由于隐私和监管方面的考虑,共享医疗保健数据既耗时又困难。个人健康列车(PHT)提供了一个隐私设计的基础设施,连接了 FAIR(可查找、可访问、可互操作、可重用)数据源,并允许分布式数据分析和机器学习。患者数据从未离开过医疗机构。

材料与方法

根据 FAIR 数据模型对肿瘤学部门的肺癌患者特定数据库(肿瘤分期和治疗后生存信息)进行翻译,并在本地存储在图形数据库中。在本地安装软件,以便通过中央服务器部署分布式机器学习算法。算法(MATLAB,代码和文档均可公开获得)对患者隐私具有保护作用,因为仅与中央服务器交换汇总统计信息和回归系数。通过接收者操作特征曲线(ROC)、均方根预测误差(RMSE)和校准图来训练和评估用于预测治疗后两年生存的逻辑回归模型。

结果

在 4 个月的时间里,我们使用 PHT 连接了 8 个医疗机构的 23203 名患者的数据库,这些医疗机构分布在 5 个国家(阿姆斯特丹、卡迪夫、马斯特里赫特、曼彻斯特、奈梅亨、罗马、鹿特丹、上海)。在数据库之间计算汇总统计信息。在 14810 名 1978 年至 2011 年期间治疗的患者和 8393 名 2012 年至 2015 年期间治疗的患者上训练了用于预测治疗后两年生存的分布式逻辑回归模型,并对其进行了验证。

结论

PHT 基础设施明显克服了患者隐私障碍,实现了医疗保健数据的共享,并能够在来自不同国家和具有不同监管方案的多个机构之间进行快速数据分析。该基础设施在优先考虑患者隐私的同时,促进了全球循证医学。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验