School of Information Science and Technology, Fudan University, Shanghai 200438, China.
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China.
Comput Biol Med. 2024 May;173:108361. doi: 10.1016/j.compbiomed.2024.108361. Epub 2024 Mar 26.
Deep learning plays a significant role in the detection of pulmonary nodules in low-dose computed tomography (LDCT) scans, contributing to the diagnosis and treatment of lung cancer. Nevertheless, its effectiveness often relies on the availability of extensive, meticulously annotated dataset. In this paper, we explore the utilization of an incompletely annotated dataset for pulmonary nodules detection and introduce the FULFIL (Forecasting Uncompleted Labels For Inexpensive Lung nodule detection) algorithm as an innovative approach. By instructing annotators to label only the nodules they are most confident about, without requiring complete coverage, we can substantially reduce annotation costs. Nevertheless, this approach results in an incompletely annotated dataset, which presents challenges when training deep learning models. Within the FULFIL algorithm, we employ Graph Convolution Network (GCN) to discover the relationships between annotated and unannotated nodules for self-adaptively completing the annotation. Meanwhile, a teacher-student framework is employed for self-adaptive learning using the completed annotation dataset. Furthermore, we have designed a Dual-Views loss to leverage different data perspectives, aiding the model in acquiring robust features and enhancing generalization. We carried out experiments using the LUng Nodule Analysis (LUNA) dataset, achieving a sensitivity of 0.574 at a False positives per scan (FPs/scan) of 0.125 with only 10% instance-level annotations for nodules. This performance outperformed comparative methods by 7.00%. Experimental comparisons were conducted to evaluate the performance of our model and human experts on test dataset. The results demonstrate that our model can achieve a comparable level of performance to that of human experts. The comprehensive experimental results demonstrate that FULFIL can effectively leverage an incomplete pulmonary nodule dataset to develop a robust deep learning model, making it a promising tool for assisting in lung nodule detection.
深度学习在低剂量计算机断层扫描 (LDCT) 扫描中的肺结节检测中发挥着重要作用,有助于肺癌的诊断和治疗。然而,其有效性通常依赖于广泛的、精心标注的数据集。在本文中,我们探索了使用不完备标注数据集进行肺结节检测,并引入了 FULFIL(预测未完成标签用于廉价肺结节检测)算法作为一种创新方法。通过指导标注者仅对他们最有信心的结节进行标注,而不需要完全覆盖,我们可以大大降低标注成本。然而,这种方法会导致数据集不完备,这在训练深度学习模型时会带来挑战。在 FULFIL 算法中,我们使用图卷积网络 (GCN) 发现标注和未标注结节之间的关系,以自适应地完成标注。同时,采用师生框架,利用完成标注的数据集进行自适应学习。此外,我们设计了一种双重视图损失函数,利用不同的数据视角,帮助模型获取稳健的特征并提高泛化能力。我们在 LUng Nodule Analysis (LUNA) 数据集上进行了实验,在假阳性率为每扫描 0.125(FPs/scan)的情况下,仅使用 10%的实例级结节标注,实现了敏感性为 0.574 的性能。这一性能比对比方法高出 7.00%。我们还进行了实验比较,以评估我们的模型和人类专家在测试数据集上的性能。结果表明,我们的模型可以达到与人类专家相当的性能水平。综合实验结果表明,FULFIL 可以有效地利用不完备的肺结节数据集来开发强大的深度学习模型,是一种有前途的辅助肺结节检测的工具。