Sánchez C I, Niemeijer M, Abràmoff M D, van Ginneken B
Department of Radiology, Radboud University Nijmegen Medical Centre, The Netherlands.
Med Image Comput Comput Assist Interv. 2010;13(Pt 3):603-10. doi: 10.1007/978-3-642-15711-0_75.
The performance of computer-aided diagnosis (CAD) systems can be highly influenced by the training strategy. CAD systems are traditionally trained using available labeled data, extracted from a specific data distribution or from public databases. Due to the wide variability of medical data, these databases might not be representative enough when the CAD system is applied to data extracted from a different clinical setting, diminishing the performance or requiring more labeled samples in order to get better data generalization. In this work, we propose the incorporation of an active learning approach in the training phase of CAD systems for reducing the number of required training samples while maximizing the system performance. The benefit of this approach has been evaluated using a specific CAD system for Diabetic Retinopathy screening. The results show that (1) using a training set obtained from a different data source results in a considerable reduction of the CAD performance; and (2) using active learning the selected training set can be reduced from 1000 to 200 samples while maintaining an area under the Receiver Operating Characteristic curve of 0.856.
计算机辅助诊断(CAD)系统的性能会受到训练策略的很大影响。传统上,CAD系统是使用从特定数据分布或公共数据库中提取的可用标记数据进行训练的。由于医学数据的广泛变异性,当CAD系统应用于从不同临床环境中提取的数据时,这些数据库可能代表性不足,从而降低性能,或者需要更多标记样本才能实现更好的数据泛化。在这项工作中,我们建议在CAD系统的训练阶段采用主动学习方法,以减少所需训练样本的数量,同时使系统性能最大化。我们使用一个用于糖尿病视网膜病变筛查的特定CAD系统评估了这种方法的益处。结果表明:(1)使用从不同数据源获得的训练集会导致CAD性能大幅下降;(2)使用主动学习,选定的训练集可以从1000个样本减少到200个样本,同时保持接收器操作特征曲线下的面积为0.856。