Division of Cardiology, Department of Medicine, Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, United States of America.
Division of Cardiology, Department of Medicine, Department of Radiology, Bakar Computational Health Sciences Institute, Computational Precision Health Graduate Program, Center for Intelligent Imaging, Biological and Medical Informatics Graduate Program, Chan Zuckerberg Biohub Intercampus Research Award Investigator, University of California San Francisco, San Francisco, California, United States of America.
PLoS One. 2023 Mar 23;18(3):e0282532. doi: 10.1371/journal.pone.0282532. eCollection 2023.
While domain-specific data augmentation can be useful in training neural networks for medical imaging tasks, such techniques have not been widely used to date. Our objective was to test whether domain-specific data augmentation is useful for medical imaging using a well-benchmarked task: view classification on fetal ultrasound FETAL-125 and OB-125 datasets. We found that using a context-preserving cut-paste strategy, we could create valid training data as measured by performance of the resulting trained model on the benchmark test dataset. When used in an online fashion, models trained on this hybrid data performed similarly to those trained using traditional data augmentation (FETAL-125 F-score 85.33 ± 0.24 vs 86.89 ± 0.60, p-value 0.014; OB-125 F-score 74.60 ± 0.11 vs 72.43 ± 0.62, p-value 0.004). Furthermore, the ability to perform augmentations during training time, as well as the ability to apply chosen augmentations equally across data classes, are important considerations in designing a bespoke data augmentation. Finally, we provide open-source code to facilitate running bespoke data augmentations in an online fashion. Taken together, this work expands the ability to design and apply domain-guided data augmentations for medical imaging tasks.
虽然针对特定领域的数据增强在医学影像任务的神经网络训练中可能很有用,但到目前为止,这些技术尚未得到广泛应用。我们的目标是使用经过充分基准测试的任务来测试针对医学成像的特定领域的数据增强是否有用:在 FETAL-125 和 OB-125 数据集上进行视图分类。我们发现,通过使用上下文保留的裁剪粘贴策略,我们可以创建有效的训练数据,这可以通过对基准测试数据集上的训练模型的性能来衡量。当以在线方式使用时,在这种混合数据上训练的模型的性能与使用传统数据增强训练的模型相似(FETAL-125 F 分数 85.33 ± 0.24 与 86.89 ± 0.60,p 值 0.014;OB-125 F 分数 74.60 ± 0.11 与 72.43 ± 0.62,p 值 0.004)。此外,在训练期间执行增强的能力以及在数据类之间平等应用选定增强的能力是设计定制数据增强的重要考虑因素。最后,我们提供了开源代码,以方便以在线方式运行定制的数据增强。总之,这项工作扩展了针对医学成像任务设计和应用领域引导数据增强的能力。