Makeev Andrey, Li Kaiyan, Anastasio Mark A, Emig Arthur, Jahnke Paul, Glick Stephen J
U.S. Food & Drug Administration, Silver Spring, Maryland, United States.
University of Illinois Urbana-Champaign, Urbana, Illinois, United States.
J Med Imaging (Bellingham). 2025 Jan;12(Suppl 1):S13005. doi: 10.1117/1.JMI.12.S1.S13005. Epub 2024 Oct 15.
Conventional metrics used for assessing digital mammography (DM) and digital breast tomosynthesis (DBT) image quality, including noise, spatial resolution, and detective quantum efficiency, do not necessarily predict how well the system will perform in a clinical task. A number of existing phantom-based methods have their own limitations, such as unrealistic uniform backgrounds, subjective scoring using humans, and regular signal patterns unrepresentative of common clinical findings. We attempted to address this problem with a realistic breast phantom with random hydroxyapatite microcalcifications and semi-automated deep learning-based image scoring. Our goal was to develop a methodology for objective task-based assessment of image quality for tomosynthesis and DM systems, which includes an anthropomorphic phantom, a detection task (microcalcification clusters), and automated performance evaluation using a convolutional neural network.
Experimental 2D and pseudo-3D mammograms of an anthropomorphic inkjet-printed breast phantom with inserted microcalcification clusters were collected on clinical mammography systems to train a signal-present/signal-absent image classifier based on Resnet-18 architecture. In a separate validation study using simulations, this Resnet-18 classifier was shown to approach the performance of an ideal observer. Microcalcification detection performance was evaluated as a function of four dose levels using receiver operating characteristic (ROC) analysis [i.e., area under the ROC curve (AUC)]. To demonstrate the use of this evaluation approach for assessing different technologies, the method was applied to two different mammography systems, as well as to mammograms with re-binned pixels emulating a lower-resolution X-ray detector.
Microcalcification detectability, as assessed by the deep learning classifier, was observed to vary with the exposure incident on the breast phantom for both DM and tomosynthesis. At full dose, experimental AUC was 0.96 (for DM) and 0.95 (for DBT), whereas at half dose, it dropped to 0.85 and 0.71, respectively. AUC performance on DM was significantly decreased with an effective larger pixel size obtained with re-binning. The task-based assessment approach also showed the superiority of a newer mammography system compared with an older system.
An objective task-based methodology for assessing the image quality of mammography and tomosynthesis systems is proposed. Possible uses for this tool could be quality control, acceptance, and constancy testing, assessing the safety and effectiveness of new technology for regulatory submissions, and system optimization. The results from this study showed that the proposed evaluation method using a deep learning model observer can track differences in microcalcification signal detectability with varied exposure conditions.
用于评估数字乳腺钼靶(DM)和数字乳腺断层合成(DBT)图像质量的传统指标,包括噪声、空间分辨率和探测量子效率,不一定能预测系统在临床任务中的表现。许多现有的基于体模的方法都有其局限性,比如背景均匀度不真实、人工主观评分以及常规信号模式不能代表常见临床发现。我们试图通过一个带有随机羟基磷灰石微钙化的逼真乳腺体模和基于深度学习的半自动图像评分来解决这个问题。我们的目标是开发一种基于任务的客观方法,用于评估断层合成和DM系统的图像质量,该方法包括一个拟人化体模、一个检测任务(微钙化簇)以及使用卷积神经网络进行自动性能评估。
在临床乳腺钼靶系统上收集了带有插入微钙化簇的拟人化喷墨打印乳腺体模的实验性二维和伪三维乳腺钼靶图像,以训练基于Resnet - 18架构的信号存在/信号缺失图像分类器。在一项单独的使用模拟的验证研究中,这个Resnet - 18分类器被证明接近理想观察者的性能。使用接收器操作特征(ROC)分析[即ROC曲线下面积(AUC)],将微钙化检测性能评估为四个剂量水平的函数。为了证明这种评估方法用于评估不同技术的用途,该方法被应用于两个不同的乳腺钼靶系统,以及具有重新分箱像素以模拟低分辨率X射线探测器的乳腺钼靶图像。
通过深度学习分类器评估的微钙化可检测性,对于DM和断层合成而言,均随入射到乳腺体模上的曝光量而变化。在全剂量时,实验AUC为0.96(DM)和0.95(DBT),而在半剂量时,分别降至0.85和0.71。重新分箱获得有效更大像素尺寸时,DM的AUC性能显著下降。基于任务的评估方法还显示了一种新型乳腺钼靶系统相对于旧系统的优越性。
提出了一种基于任务的客观方法,用于评估乳腺钼靶和断层合成系统的图像质量。该工具的可能用途包括质量控制、验收和稳定性测试,评估新技术用于监管申报的安全性和有效性,以及系统优化。本研究结果表明,所提出的使用深度学习模型观察者的评估方法可以跟踪不同曝光条件下微钙化信号可检测性的差异。