Pezeshk Aria, Petrick Nicholas, Sahiner Berkman
IEEE Trans Med Imaging. 2017 Apr;36(4):1005-1015. doi: 10.1109/TMI.2016.2640180. Epub 2016 Dec 14.
The performance of a classifier is largely dependent on the size and representativeness of data used for its training. In circumstances where accumulation and/or labeling of training samples is difficult or expensive, such as medical applications, data augmentation can potentially be used to alleviate the limitations of small datasets. We have previously developed an image blending tool that allows users to modify or supplement an existing CT or mammography dataset by seamlessly inserting a lesion extracted from a source image into a target image. This tool also provides the option to apply various types of transformations to different properties of the lesion prior to its insertion into a new location. In this study, we used this tool to create synthetic samples that appear realistic in chest CT. We then augmented different size training sets with these artificial samples, and investigated the effect of the augmentation on training various classifiers for the detection of lung nodules. Our results indicate that the proposed lesion insertion method can improve classifier performance for small training datasets, and thereby help reduce the need to acquire and label actual patient data.
分类器的性能在很大程度上取决于用于其训练的数据的大小和代表性。在积累和/或标记训练样本困难或昂贵的情况下,如医学应用中,数据增强可潜在地用于缓解小数据集的局限性。我们之前开发了一种图像融合工具,该工具允许用户通过将从源图像中提取的病变无缝插入目标图像来修改或补充现有的CT或乳腺X线摄影数据集。此工具还提供了在将病变插入新位置之前对其不同属性应用各种类型变换的选项。在本研究中,我们使用此工具创建在胸部CT中看起来逼真的合成样本。然后,我们用这些人工样本扩充不同大小的训练集,并研究扩充对训练各种用于检测肺结节的分类器的影响。我们的结果表明,所提出的病变插入方法可以提高小训练数据集的分类器性能,从而有助于减少获取和标记实际患者数据的需求。