Khara Galvin, Trivedi Hari, Newell Mary S, Patel Ravi, Rijken Tobias, Kecskemethy Peter, Glocker Ben
Kheiron Medical Technologies, London, UK.
Winship Cancer Institute, Emory University, Atlanta, GA, USA.
Commun Med (Lond). 2024 Feb 19;4(1):21. doi: 10.1038/s43856-024-00446-6.
Breast density is an important risk factor for breast cancer complemented by a higher risk of cancers being missed during screening of dense breasts due to reduced sensitivity of mammography. Automated, deep learning-based prediction of breast density could provide subject-specific risk assessment and flag difficult cases during screening. However, there is a lack of evidence for generalisability across imaging techniques and, importantly, across race.
This study used a large, racially diverse dataset with 69,697 mammographic studies comprising 451,642 individual images from 23,057 female participants. A deep learning model was developed for four-class BI-RADS density prediction. A comprehensive performance evaluation assessed the generalisability across two imaging techniques, full-field digital mammography (FFDM) and two-dimensional synthetic (2DS) mammography. A detailed subgroup performance and bias analysis assessed the generalisability across participants' race.
Here we show that a model trained on FFDM-only achieves a 4-class BI-RADS classification accuracy of 80.5% (79.7-81.4) on FFDM and 79.4% (78.5-80.2) on unseen 2DS data. When trained on both FFDM and 2DS images, the performance increases to 82.3% (81.4-83.0) and 82.3% (81.3-83.1). Racial subgroup analysis shows unbiased performance across Black, White, and Asian participants, despite a separate analysis confirming that race can be predicted from the images with a high accuracy of 86.7% (86.0-87.4).
Deep learning-based breast density prediction generalises across imaging techniques and race. No substantial disparities are found for any subgroup, including races that were never seen during model development, suggesting that density predictions are unbiased.
乳腺密度是乳腺癌的一个重要风险因素,此外,由于乳腺钼靶检查敏感度降低,在对致密乳腺进行筛查时漏诊癌症的风险更高。基于深度学习的乳腺密度自动预测可为个体提供风险评估,并在筛查过程中标记出疑难病例。然而,目前缺乏关于该方法在不同成像技术间,尤其是在不同种族间的通用性的证据。
本研究使用了一个大型的、种族多样的数据集,其中包含69697例乳腺钼靶检查,共451642张个体图像,来自23057名女性参与者。开发了一种深度学习模型用于四类乳腺影像报告和数据系统(BI-RADS)密度预测。进行了全面的性能评估,以评估该模型在两种成像技术(全视野数字乳腺钼靶检查(FFDM)和二维合成(2DS)乳腺钼靶检查)间的通用性。进行了详细的亚组性能和偏差分析,以评估该模型在不同种族参与者中的通用性。
我们发现,仅在FFDM上训练的模型在FFDM上实现了四类BI-RADS分类准确率为80.5%(79.7-81.4),在未见过的2DS数据上为79.4%(78.5-80.2)。当在FFDM和2DS图像上都进行训练时,性能提高到82.3%(81.4-83.0)和82.3%(81.3-83.1)。种族亚组分析显示,在黑人、白人和亚洲参与者中,该模型的性能无偏差,尽管单独分析证实可以从图像中高精度地预测种族,准确率为86.7%(86.0-87.4)。
基于深度学习的乳腺密度预测在不同成像技术和种族间具有通用性。在任何亚组中均未发现明显差异,包括在模型开发过程中从未见过的种族,这表明密度预测是无偏差的。