Suppr超能文献

利用医疗大数据对约100万中国人群中的17种肺癌风险预测模型进行评估与重新校准:一项回顾性队列分析

Assessment and recalibration of seventeen lung cancer risk prediction models in approximately one million Chinese population utilising healthcare big data: a retrospective cohort analysis.

作者信息

Ye Ziqing, Sun Yexiang, Yin Yueqi, Liu Liya, Cui Miao, Zhang Longyao, Hao Yuantao, Christiani David C, Lin Hongbo, Shen Peng, Wei Yongyue

机构信息

Center for Public Health and Epidemic Preparedness & Response, Peking University, Beijing, China.

Department of Epidemiology & Biostatistics, School of Public Health, Peking University, Beijing, China.

出版信息

Lancet Reg Health West Pac. 2025 May 16;58:101575. doi: 10.1016/j.lanwpc.2025.101575. eCollection 2025 May.

Abstract

BACKGROUND

A number of lung cancer prediction models have been developed worldwide. However, there have been limited validation studies conducted specifically on Chinese populations. The objective of this study is to evaluate the feasibility and performance of 17 global lung cancer risk prediction models when applied to Chinese healthcare big data.

METHODS

The study encompassed individuals whose information was recorded in the Yinzhou Regional Health Care Database (YRHCD) between January 1, 2010 and December 31, 2021. The 17 lung cancer risk prediction models, which comprised the Bach, the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial 2012 model (PLCO), the Korean Men, the PLCO, the Pittsburgh Predictor, Liverpool Lung Project Risk Prediction Model for Lung Cancer Incidence (LLPi), the Lung Cancer Risk Assessment Tool (LCRAT), Constrained LCRAT, the Nord-Trøndelag Health Study (HUNT), the Japan Public Health Center-based study (JPHC), Reduced HUNT, the PLCO without information of race (PLCO), the Liverpool Lung Project version 3 (LLPv3), Lung Cancer Risk Score (LCRS), the Optimized Early Warning Model for Lung Cancer Risk (OWL), the University College London-Incidence (UCL-I), the Shanghai Lung Cancer incidence Model (Shanghai-LCM), were evaluated for their performance in overall population and subgroups stratified by age and sex. The discrimination of the 17 models was assessed using Harrell's C-index and time-dependent area under the curve (AUC). The calibration of the models was evaluated using the expected-to-observed ratio (EOR) and calibration curves. Moreover, the models were recalibrated in the Yinzhou population, and the calibration of the recalibrated models was evaluated. For each model before and after recalibration, we redefined risk thresholds that would select the same number of individuals as the China National Lung Cancer Screening Guideline with Low-dose Computed Tomography 2023 Version (CNLCS 2023) could screen out. The Kaplan-Meier method was used to estimate the incidence and number of cases of lung cancer in individuals screened according to different criteria or models over a five-year follow-up period, and Kaplan-Meier survival curves were plotted.

FINDINGS

A total of 904,667 study participants were included in the analysis, comprising 66,730 ever smokers and 837,937 never smokers. Among the 17 models initially considered, only six (Bach, Pittsburgh Predictor, JPHC, Reduced HUNT, Constrained LCRAT, UCL-I) had complete information of predictive variables available in the YRHCD. Most models showed similar levels of discrimination, with C-indices ranging from 0.78 (95% CI 0.74-0.82) to 0.88 (0.87-0.89) and time-dependent AUCs ranging from 0.74 (95% CI 0.73-0.75) to 0.88 (0.87-0.89). The majority of models showed an overestimation of incidence risk among ever smokers, with EORs ranging from 1.10 (95% CI 1.02-1.19) to 4.37 (4.16-4.58), and an underestimation among never smokers with a few models showing exceptions - EORs ranging from 0.12 (95% CI 0.11-0.14) to 1.30 (1.26-1.35). After recalibration, all models showed improved accuracy of predicted probability. The five-year incidence rates observed in the model-selected population, ranging from 0.81% (95% CI 0.64%-0.96%) to 1.29% (1.08%-1.48%), were consistently higher than that observed in the criteria-selected population (0.75%, 95% CI 0.59%-0.90%). Following recalibration, the five-year incidence rates in the model-selected population improved, ranging from 0.81% (95% CI 0.64%-0.96%) to 1.60% (1.36%-1.82%).

INTERPRETATION

The majority of recalibrated models demonstrated comparable and favorable discrimination and calibration capability, and were capable of identifying individuals at an elevated risk of lung cancer with greater precision than the criteria. Models designed for the general population (such as LLPv3, LLPi, Korean Men, JPHC, and LCRS) are more appropriate for identifying high-risk groups compared to those exclusively for smokers.

FUNDING

National Natural Science Foundation of China, General Project of Zhejiang Provincial Medical and Health Technology Plan for the Year 2024, Natural Science Foundation of Zhejiang Province.

摘要

背景

全球已开发出多种肺癌预测模型。然而,专门针对中国人群进行的验证研究有限。本研究的目的是评估17种全球肺癌风险预测模型应用于中国医疗大数据时的可行性和性能。

方法

本研究纳入了2010年1月1日至2021年12月31日期间在鄞州区域医疗保健数据库(YRHCD)中记录信息的个体。这17种肺癌风险预测模型包括巴赫模型、前列腺、肺癌、结直肠癌和卵巢癌筛查试验2012模型(PLCO)、韩国男性模型、PLCO模型、匹兹堡预测模型、利物浦肺癌项目肺癌发病率风险预测模型(LLPi)、肺癌风险评估工具(LCRAT)、受限LCRAT、北特伦德拉格健康研究(HUNT)、日本公共卫生中心研究(JPHC)、简化HUNT、无种族信息的PLCO(PLCO)、利物浦肺癌项目第3版(LLPv3)、肺癌风险评分(LCRS)、肺癌风险优化预警模型(OWL)、伦敦大学学院发病率模型(UCL-I)、上海肺癌发病率模型(上海-LCM),评估了它们在总体人群以及按年龄和性别分层的亚组中的性能。使用哈雷尔C指数和时间依赖性曲线下面积(AUC)评估这17种模型的辨别力。使用预期与观察比率(EOR)和校准曲线评估模型的校准情况。此外,在鄞州人群中对模型进行重新校准,并评估重新校准后模型的校准情况。对于重新校准前后的每个模型,我们重新定义了风险阈值,该阈值将选择与《2023版中国低剂量计算机断层扫描肺癌筛查指南》(CNLCS 2023)能够筛查出的个体数量相同的个体。采用Kaplan-Meier方法估计在五年随访期内根据不同标准或模型筛查的个体中肺癌的发病率和病例数,并绘制Kaplan-Meier生存曲线。

结果

分析共纳入904,667名研究参与者,包括66,730名曾经吸烟者和837,937名从不吸烟者。在最初考虑的17种模型中,只有六种(巴赫模型、匹兹堡预测模型、JPHC、简化HUNT、受限LCRAT、UCL-I)YRHCD中具有完整的预测变量信息。大多数模型显示出相似的辨别水平,C指数范围为0.78(95%CI 0.74 - 0.82)至0.88(0.87 - 0.89),时间依赖性AUC范围为0.74(95%CI 0.73 - 0.75)至0.88(0.87 - 0.89)。大多数模型显示曾经吸烟者的发病风险被高估,EOR范围为1.10(95%CI 1.02 - 1.19)至4.37(4.16 - 4.58),从不吸烟者的发病风险被低估,少数模型为例外 - EOR范围为0.12(95%CI 0.11 - 0.14)至1.30(1.26 - 1.35)。重新校准后,所有模型的预测概率准确性均有所提高。模型选择人群中观察到的五年发病率范围为0.81%(95%CI 0.64% - 0.96%)至1.29%(1.08% - 1.48%)。,始终高于标准选择人群中观察到的发病率(0.75%,95%CI 0.59% - 0.90%)。重新校准后,模型选择人群中的五年发病率有所提高,范围为0.81%(95%CI 0.64% - 0.96%)至1.60%(1.36% - 1.82%)。

解读

大多数重新校准后的模型表现出相当且良好的辨别力和校准能力,并且能够比标准更精确地识别肺癌风险升高的个体。与专门针对吸烟者的模型相比,针对一般人群设计的模型(如LLPv3、LLPi、韩国男性模型、JPHC和LCRS)更适合识别高危人群。

资助

中国国家自然科学基金、2024年度浙江省医疗卫生科技计划一般项目、浙江省自然科学基金。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225e/12143837/1464b831192f/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验