一种基于深度学习算法的胰腺癌风险预测方法。
A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories.
机构信息
Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
Harvard Medical School, Boston, MA, USA.
出版信息
Nat Med. 2023 May;29(5):1113-1122. doi: 10.1038/s41591-023-02332-5. Epub 2023 May 8.
Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection. In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet). For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Cross-application of the Danish model to US-VA data had lower performance (AUROC = 0.71), and retraining was needed to improve performance (AUROC = 0.78, AUROC (3m) = 0.76). These results improve the ability to design realistic surveillance programs for patients at elevated risk, potentially benefiting lifespan and quality of life by early detection of this aggressive cancer.
胰腺癌是一种侵袭性疾病,通常在晚期出现,预后较差,这表明迫切需要早期发现。在这项研究中,我们应用人工智能方法分析了来自丹麦(丹麦国家患者登记处 (DNPR))600 万患者(24000 例胰腺癌病例)和美国(美国退伍军人事务部 (US-VA))300 万患者的临床数据。我们在临床病史的疾病编码序列上训练机器学习模型,并在增量时间窗口内测试癌症发生的预测(CancerRiskNet)。对于 36 个月内的癌症发生情况,最佳的 DNPR 模型的表现为接收器操作特征 (AUROC) 曲线下面积为 0.88,当将癌症诊断前 3 个月内的疾病事件排除在训练之外时,AUROC(3m)降至 0.83,对于 50 岁以上的 1000 名最高风险患者,估计相对风险为 59。将丹麦模型交叉应用于 US-VA 数据的性能较低(AUROC=0.71),需要重新训练以提高性能(AUROC=0.78,AUROC(3m)=0.76)。这些结果提高了为高风险患者设计现实监测计划的能力,通过早期发现这种侵袭性癌症,可能延长患者的寿命和提高生活质量。