Cancer and Environmental Epidemiology Unit, National Center for Epidemiology, Instituto de Salud Carlos III, Sinesio Delgado 6, 28029 Madrid, Spain.
Breast Cancer Res Treat. 2012 Feb;132(1):287-95. doi: 10.1007/s10549-011-1833-3. Epub 2011 Nov 1.
Measurement of mammographic density (MD), one of the leading risk factors for breast cancer, still relies on subjective assessment. However, the consistency of MD measurement in full-digital mammograms has yet to be evaluated. We studied inter- and intra-rater agreement with respect to estimation of breast density in full-digital mammograms, and tested whether any of the women's characteristics might have some influence on them. After an initial training period, three experienced radiologists estimated MD using Boyd scale in a left breast cranio-caudal mammogram of 1,431 women, recruited at three Spanish screening centres. A subgroup of 50 randomly selected images was read twice to estimate short-term intra-rater agreement. In addition, a reading of 1,428 of the images, performed 2 years before by one rater, was used to estimate long-term intra-rater agreement. Pair-wise weighted kappas with 95% bootstrap confidence intervals were calculated. Dichotomous variables were defined to identify mammograms in which any rater disagreed with other raters or with his/her own assessment, respectively. The association between disagreement and women's characteristics was tested using multivariate mixed logistic models, including centre as a random-effects term, and taking into account repeated measures when required. All quadratic-weighted kappa values for inter- and intra-rater agreement were excellent (higher than 0.80). None of the studied women's features, i.e. body mass index, brassiere size, menopause, nulliparity, lactation or current hormonal therapy, was associated with higher risk of inter- or intra-rater disagreement. However, raters differed significantly more in images that were classified in the higher-density MD categories, and disagreement in intra-rater assessment was also lower in low-density mammograms. The reliability of MD assessment in full-field digital mammograms is comparable to that for original or digitised images. The reassuring lack of association between subjects' MD-related characteristics and agreement suggests that bias from this source is unlikely.
乳腺密度(MD)的测量是乳腺癌的主要危险因素之一,目前仍依赖于主观评估。然而,全数字化乳腺钼靶片中 MD 测量的一致性尚未得到评估。我们研究了三位经验丰富的放射科医生在全数字化乳腺钼靶片中评估乳腺密度的组内和组间一致性,并检验了女性特征是否会对其产生影响。在初始培训期过后,三位医生在三个西班牙筛查中心招募的 1431 名女性的左乳头尾位全数字化乳腺钼靶片中,使用 Boyd 量表评估 MD。从这 50 张随机选择的图像中随机抽取 2 次进行短期组内一致性评估。此外,还对一位医生在 2 年前阅读的 1428 张图像进行了评估,以评估长期组内一致性。计算了 95%自举置信区间的配对加权 kappa 值。使用多元混合逻辑模型,定义了二项变量,以识别任何一位医生与其他医生或自身评估存在分歧的乳腺钼靶片。使用多元混合逻辑模型,包括中心作为随机效应项,并在需要时考虑重复测量,检验了分歧与女性特征之间的关系。组内和组间一致性的所有二次加权 kappa 值均为极好(高于 0.80)。研究中没有发现任何女性特征,如体重指数、文胸尺寸、绝经、初产、哺乳或当前激素治疗,与组内或组间一致性的差异风险增加有关。然而,在较高密度 MD 分类的图像中,评估者之间的差异更大,在低密度乳腺钼靶片中,组内评估的分歧也更小。全数字化乳腺钼靶片中 MD 评估的可靠性与原始或数字化图像相当。在 MD 相关特征与一致性之间缺乏关联表明,这种来源的偏差不太可能存在。