Dartois Laureen, Gauthier Émilien, Heitzmann Julia, Baglietto Laura, Michiels Stefan, Mesrine Sylvie, Boutron-Ruault Marie-Christine, Delaloge Suzette, Ragusa Stéphane, Clavel-Chapelon Françoise, Fagherazzi Guy
Inserm (Institut National de la Santé et de la Recherche Médicale), Centre for Research in Epidemiology and Population Health (CESP), U1018, Team 9, 114 rue Édouard Vaillant, 94805, Villejuif Cedex, France.
Breast Cancer Res Treat. 2015 Apr;150(2):415-26. doi: 10.1007/s10549-015-3321-7. Epub 2015 Mar 6.
Breast cancer remains a global health concern with a lack of high discriminating prediction models. The k-nearest-neighbor algorithm (kNN) estimates individual risks using an intuitive tool. This study compares the performances of this approach with the Cox and the Gail models for the 5-year breast cancer risk prediction. The study included 64,995 women from the French E3N prospective cohort. The sample was divided into a learning (N = 51,821) series to learn the models using fivefold cross-validation and a validation (N = 13,174) series to evaluate them. The area under the receiver operating characteristic curve (AUC) and the expected over observed number of cases (E/O) ratio were estimated. In the two series, 393 and 78 premenopausal and 537 and 98 postmenopausal breast cancers were diagnosed. The discrimination values of the best combinations of predictors obtained from cross-validation ranged from 0.59 to 0.60. In the validation series, the AUC values in premenopausal and postmenopausal women were 0.583 [0.520; 0.646] and 0.621 [0.563; 0.679] using the kNN and 0.565 [0.500; 0.631] and 0.617 [0.561; 0.673] using the Cox model. The E/O ratios were 1.26 and 1.28 in premenopausal women and 1.44 and 1.40 in postmenopausal women. The applied Gail model provided AUC values of 0.614 [0.554; 0.675] and 0.549 [0.495; 0.604] and E/O ratios of 0.78 and 1.12. This study shows that the prediction performances differed according to menopausal status when using parametric statistical tools. The k-nearest-neighbor approach performed well, and discrimination was improved in postmenopausal women compared with the Gail model.
乳腺癌仍然是一个全球关注的健康问题,缺乏高区分度的预测模型。k近邻算法(kNN)使用一种直观的工具来估计个体风险。本研究比较了该方法与Cox模型和Gail模型在预测5年乳腺癌风险方面的性能。该研究纳入了来自法国E3N前瞻性队列的64,995名女性。样本被分为一个学习组(N = 51,821),用于使用五折交叉验证来学习模型,以及一个验证组(N = 13,174),用于评估模型。估计了受试者工作特征曲线下面积(AUC)和预期病例数与观察病例数之比(E/O)。在这两个组中,分别诊断出393例和78例绝经前乳腺癌以及537例和98例绝经后乳腺癌。通过交叉验证获得的预测因子最佳组合的区分度值在0.59至0.60之间。在验证组中,使用kNN模型时,绝经前和绝经后女性的AUC值分别为0.583 [0.520; 0.646]和0.621 [0.563; 0.679],使用Cox模型时分别为0.565 [0.500; 0.631]和0.617 [0.561; 0.673]。绝经前女性的E/O比为1.26和1.28,绝经后女性为1.44和1.40。应用的Gail模型提供的AUC值为0.614 [0.554; 0.675]和0.549 [0.495; 0.604],E/O比为0.78和1.12。本研究表明,使用参数统计工具时,预测性能因绝经状态而异。k近邻方法表现良好,与Gail模型相比,绝经后女性的区分度有所提高。