Suppr超能文献

提高对乳腺肿块分割数据集进行完全标注的标注效率。

Improving annotation efficiency for fully labeling a breast mass segmentation dataset.

作者信息

Sharma Vaibhav, Barnett Alina Jade, Yang Julia, Cheon Sangwook, Kim Giyoung, Regina Schwartz Fides, Wang Avivah, Hall Neal, Grimm Lars, Chen Chaofan, Lo Joseph Y, Rudin Cynthia

机构信息

Duke University, Department of Computer Science, Durham, North Carolina, United States.

Duke University School of Medicine, Department of Radiology, Durham, North Carolina, United States.

出版信息

J Med Imaging (Bellingham). 2025 May;12(3):035501. doi: 10.1117/1.JMI.12.3.035501. Epub 2025 May 21.

Abstract

PURPOSE

Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.

APPROACH

We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.

RESULTS

Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.

CONCLUSIONS

We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.

摘要

目的

乳腺癌仍是女性死亡的主要原因。开展筛查项目以在早期阶段检测癌症。乳腺影像研究人员目前发现的一个障碍是缺乏带标注的图像数据集。解决这个问题对于改进早期检测模型至关重要。我们提出了一种用于从二维数字乳腺钼靶图像中分割乳腺肿块的主动学习(AL)框架,并发布了带标注的数据。我们的方法旨在减少专家标注人员为获得一个完全带标注的数据集所需的投入。

方法

我们创建了一个包含1136个乳腺钼靶肿块的数据集,带有逐像素的二进制分割标注,测试子集由两个不同团队独立标注。利用这个数据集,我们在一个主动学习框架内模拟人类标注人员,以开发和比较人工智能辅助的标注方法,使用一个判别模型和一个模拟预言机来收集可接受的分割标注。在这些标注上对一个U-Net模型进行重新训练,生成新的分割结果。我们使用预言机重新标注的分割结果的百分比来评估各种预言机启发式方法,并通过在一个验证数据集上评估交并比来衡量所提出标注的质量。

结果

我们的方法将专家标注人员的投入减少了44%。我们展示了一个由获得委员会认证的放射科医生批准的1136个二进制分割标注的数据集,并公开了143幅图像的验证集,以便与其他研究人员的方法进行比较。

结论

我们证明主动学习可以显著提高创建带标注的乳腺钼靶数据集的效率和时效性。我们的框架有助于高质量数据集的开发,同时将数字乳腺钼靶领域的人工工作量降至最低。

相似文献

本文引用的文献

3
Breast Cancer Statistics, 2022.2022 年乳腺癌统计数据。
CA Cancer J Clin. 2022 Nov;72(6):524-541. doi: 10.3322/caac.21754. Epub 2022 Oct 3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验