Cancer Research UK Oxford Centre, University of Oxford, Oxford, UK
Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK.
BMJ Open. 2022 Mar 28;12(3):e050828. doi: 10.1136/bmjopen-2021-050828.
Breast cancer is the most common cancer and the leading cause of cancer-related death in women worldwide. Risk prediction models may be useful to guide risk-reducing interventions (such as pharmacological agents) in women at increased risk or inform screening strategies for early detection methods such as screening.
The study will use data for women aged 20-90 years between 2000 and 2020 from QResearch linked at the individual level to hospital episodes, cancer registry and death registry data. It will evaluate a set of modelling approaches to predict the risk of developing breast cancer within the next 10 years, the 'combined' risk of developing a breast cancer and then dying from it within 10 years, and the risk of breast cancer mortality within 10 years of diagnosis. Cox proportional hazards, competing risks, random survival forest, deep learning and XGBoost models will be explored. Models will be developed on the entire dataset, with 'apparent' performance reported, and internal-external cross-validation used to assess performance and geographical and temporal transportability (two 10-year time periods). Random effects meta-analysis will pool discrimination and calibration metric estimates from individual geographical units obtained from internal-external cross-validation. We will then externally validate the models in an independent dataset. Evaluation of performance heterogeneity will be conducted throughout, such as exploring performance across ethnic groups.
Ethics approval was granted by the QResearch scientific committee (reference number REC 18/EM/0400: OX129). The results will be written up for submission to peer-reviewed journals.
乳腺癌是全球最常见的癌症,也是导致女性癌症相关死亡的主要原因。风险预测模型可能有助于指导高风险女性的降低风险干预措施(如药物治疗),或为早期检测方法(如筛查)提供信息以制定筛查策略。
该研究将使用 2000 年至 2020 年期间 QResearch 中年龄在 20 至 90 岁之间的女性个体数据,这些数据与医院就诊记录、癌症登记和死亡登记数据进行了个体水平的链接。它将评估一系列建模方法,以预测未来 10 年内乳腺癌发病风险、10 年内发生乳腺癌并因此死亡的“综合”风险,以及诊断后 10 年内乳腺癌死亡率的风险。将探索 Cox 比例风险、竞争风险、随机生存森林、深度学习和 XGBoost 模型。将在整个数据集上开发模型,并报告“明显”的性能,同时使用内部-外部交叉验证来评估性能以及地理和时间可转移性(两个 10 年时间区间)。将从内部-外部交叉验证中获得的个体地理单元的区分度和校准度量估计值进行随机效应荟萃分析进行汇总。然后,将在独立数据集之外验证模型。整个过程中都将对性能异质性进行评估,例如探索不同种族群体之间的性能。
QResearch 科学委员会已批准该研究(参考编号 REC 18/EM/0400: OX129)。研究结果将被撰写并提交给同行评审期刊。