Department of Breast Surgery, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250033, China.
School of Mathematics, Shandong University, Jinan, Shandong 250100, China.
Chin Med J (Engl). 2024 Sep 5;137(17):2084-2091. doi: 10.1097/CM9.0000000000002891. Epub 2024 Feb 26.
Breast cancer (BC) risk-stratification tools for Asian women that are highly accurate and can provide improved interpretation ability are lacking. We aimed to develop risk-stratification models to predict long- and short-term BC risk among Chinese women and to simultaneously rank potential non-experimental risk factors.
The Breast Cancer Cohort Study in Chinese Women, a large ongoing prospective dynamic cohort study, includes 122,058 women aged 25-70 years old from the eastern part of China. We developed multiple machine-learning risk prediction models using parametric models (penalized logistic regression, bootstrap, and ensemble learning), which were the short-term ensemble penalized logistic regression (EPLR) risk prediction model and the ensemble penalized long-term (EPLT) risk prediction model to estimate BC risk. The models were assessed based on calibration and discrimination, and following this assessment, they were externally validated in new study participants from 2017 to 2020.
The AUC values of the short-term EPLR risk prediction model were 0.800 for the internal validation and 0.751 for the external validation set. For the long-term EPLT risk prediction model, the area under the receiver operating characteristic curve was 0.692 and 0.760 in internal and external validations, respectively. The net reclassification improvement index of the EPLT relative to the Gail and the Han Chinese Breast Cancer Prediction Model (HCBCP) models for external validation was 0.193 and 0.233, respectively, indicating that the EPLT model has higher classification accuracy.
We developed the EPLR and EPLT models to screen populations with a high risk of developing BC. These can serve as useful tools to aid in risk-stratified screening and BC prevention.
亚洲女性缺乏高度准确且能提供更好解释能力的乳腺癌(BC)风险分层工具。我们旨在开发风险分层模型,以预测中国女性的长期和短期 BC 风险,并同时对潜在的非实验性风险因素进行排名。
中国女性乳腺癌队列研究是一项正在进行的大型前瞻性动态队列研究,纳入了来自中国东部地区的 122058 名 25-70 岁的女性。我们使用参数模型(惩罚逻辑回归、引导和集成学习)开发了多个机器学习风险预测模型,包括短期集成惩罚逻辑回归(EPLR)风险预测模型和集成惩罚长期(EPLT)风险预测模型,以估计 BC 风险。根据校准和区分评估了模型,在对模型进行评估后,我们在 2017 年至 2020 年期间使用新的研究参与者对模型进行了外部验证。
短期 EPLR 风险预测模型的内部验证和外部验证的 AUC 值分别为 0.800 和 0.751。对于长期 EPLT 风险预测模型,内部和外部验证的受试者工作特征曲线下面积分别为 0.692 和 0.760。EPLT 相对于 Gail 和汉族乳腺癌预测模型(HCBCP)模型的外部验证的净重新分类改善指数分别为 0.193 和 0.233,表明 EPLT 模型具有更高的分类准确性。
我们开发了 EPLR 和 EPLT 模型来筛选具有较高 BC 发病风险的人群。这些模型可以作为有用的工具,辅助风险分层筛查和 BC 预防。