Dipartimento di Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, Torino, 10129, Italy.
Dipartimento di Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, Torino, 10129, Italy.
Comput Biol Med. 2023 Jan;152:106391. doi: 10.1016/j.compbiomed.2022.106391. Epub 2022 Dec 9.
Recent advances in Deep Learning have largely benefited from larger and more diverse training sets. However, collecting large datasets for medical imaging is still a challenge due to privacy concerns and labeling costs. Data augmentation makes it possible to greatly expand the amount and variety of data available for training without actually collecting new samples. Data augmentation techniques range from simple yet surprisingly effective transformations such as cropping, padding, and flipping, to complex generative models. Depending on the nature of the input and the visual task, different data augmentation strategies are likely to perform differently. For this reason, it is conceivable that medical imaging requires specific augmentation strategies that generate plausible data samples and enable effective regularization of deep neural networks. Data augmentation can also be used to augment specific classes that are underrepresented in the training set, e.g., to generate artificial lesions. The goal of this systematic literature review is to investigate which data augmentation strategies are used in the medical domain and how they affect the performance of clinical tasks such as classification, segmentation, and lesion detection. To this end, a comprehensive analysis of more than 300 articles published in recent years (2018-2022) was conducted. The results highlight the effectiveness of data augmentation across organs, modalities, tasks, and dataset sizes, and suggest potential avenues for future research.
深度学习的最新进展在很大程度上受益于更大、更多样化的训练集。然而,由于隐私问题和标记成本,为医学成像收集大型数据集仍然是一个挑战。数据增强使得可以在不实际收集新样本的情况下,大大扩展训练可用数据的数量和种类。数据增强技术范围从简单但效果惊人的变换,如裁剪、填充和翻转,到复杂的生成模型。根据输入的性质和视觉任务的不同,不同的数据增强策略可能会有不同的表现。因此,可以想象医学成像需要特定的增强策略,这些策略可以生成合理的数据样本,并对深度神经网络进行有效的正则化。数据增强还可以用于增强训练集中代表性不足的特定类别,例如生成人工病变。本系统文献综述的目的是调查在医学领域中使用了哪些数据增强策略,以及它们如何影响分类、分割和病变检测等临床任务的性能。为此,对近年来(2018-2022 年)发表的 300 多篇文章进行了全面分析。结果强调了数据增强在器官、模态、任务和数据集大小方面的有效性,并为未来的研究提供了潜在的途径。