Son Jeong Woo, Hong Ji Young, Kim Yoon, Kim Woo Jin, Shin Dae-Yong, Choi Hyun-Soo, Bak So Hyeon, Moon Kyoung Min
ZIOVISION, Chuncheon 24341, Korea.
Division of Pulmonary and Critical Care Medicine, Department of Medicine, Chuncheon Sacred Heart Hospital, Hallym University Medical Center, Chuncheon 24253, Korea.
Cancers (Basel). 2022 Jun 28;14(13):3174. doi: 10.3390/cancers14133174.
Early detection of lung nodules is essential for preventing lung cancer. However, the number of radiologists who can diagnose lung nodules is limited, and considerable effort and time are required. To address this problem, researchers are investigating the automation of deep-learning-based lung nodule detection. However, deep learning requires large amounts of data, which can be difficult to collect. Therefore, data collection should be optimized to facilitate experiments at the beginning of lung nodule detection studies. We collected chest computed tomography scans from 515 patients with lung nodules from three hospitals and high-quality lung nodule annotations reviewed by radiologists. We conducted several experiments using the collected datasets and publicly available data from LUNA16. The object detection model, YOLOX was used in the lung nodule detection experiment. Similar or better performance was obtained when training the model with the collected data rather than LUNA16 with large amounts of data. We also show that weight transfer learning from pre-trained open data is very useful when it is difficult to collect large amounts of data. Good performance can otherwise be expected when reaching more than 100 patients. This study offers valuable insights for guiding data collection in lung nodules studies in the future.
早期发现肺结节对于预防肺癌至关重要。然而,能够诊断肺结节的放射科医生数量有限,且需要付出大量努力和时间。为了解决这一问题,研究人员正在研究基于深度学习的肺结节检测自动化。然而,深度学习需要大量数据,而这些数据可能难以收集。因此,在肺结节检测研究开始时,应优化数据收集以促进实验。我们从三家医院收集了515例肺结节患者的胸部计算机断层扫描图像以及经放射科医生审核的高质量肺结节标注。我们使用收集到的数据集和来自LUNA16的公开可用数据进行了多项实验。在肺结节检测实验中使用了目标检测模型YOLOX。用收集到的数据训练模型时,获得了与使用大量数据的LUNA16相似或更好的性能。我们还表明,当难以收集大量数据时,从预训练的开放数据进行权重迁移学习非常有用。否则,当患者数量超过100例时,可以预期会有良好的性能。本研究为未来肺结节研究中的数据收集提供了有价值的见解。