van Riel Sarah J, Ciompi Francesco, Winkler Wille Mathilde M, Dirksen Asger, Lam Stephen, Scholten Ernst Th, Rossi Santiago E, Sverzellati Nicola, Naqibullah Matiullah, Wittenberg Rianne, Hovinga-de Boer Marieke C, Snoeren Miranda, Peters-Bax Liesbeth, Mets Onno, Brink Monique, Prokop Mathias, Schaefer-Prokop Cornelia, van Ginneken Bram
Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, The Netherlands.
Department of Diagnostic Imaging, Section of Radiology, Nordsjællands Hospital, Hillerød, Denmark.
PLoS One. 2017 Nov 9;12(11):e0185032. doi: 10.1371/journal.pone.0185032. eCollection 2017.
To compare human observers to a mathematically derived computer model for differentiation between malignant and benign pulmonary nodules detected on baseline screening computed tomography (CT) scans.
A case-cohort study design was chosen. The study group consisted of 300 chest CT scans from the Danish Lung Cancer Screening Trial (DLCST). It included all scans with proven malignancies (n = 62) and two subsets of randomly selected baseline scans with benign nodules of all sizes (n = 120) and matched in size to the cancers, respectively (n = 118). Eleven observers and the computer model (PanCan) assigned a malignancy probability score to each nodule. Performances were expressed by area under the ROC curve (AUC). Performance differences were tested using the Dorfman, Berbaum and Metz method. Seven observers assessed morphological nodule characteristics using a predefined list. Differences in morphological features between malignant and size-matched benign nodules were analyzed using chi-square analysis with Bonferroni correction. A significant difference was defined at p < 0.004.
Performances of the model and observers were equivalent (AUC 0.932 versus 0.910, p = 0.184) for risk-assessment of malignant and benign nodules of all sizes. However, human readers performed superior to the computer model for differentiating malignant nodules from size-matched benign nodules (AUC 0.819 versus 0.706, p < 0.001). Large variations between observers were seen for ROC areas and ranges of risk scores. Morphological findings indicative of malignancy referred to border characteristics (spiculation, p < 0.001) and perinodular architectural deformation (distortion of surrounding lung parenchyma architecture, p < 0.001; pleural retraction, p = 0.002).
Computer model and human observers perform equivalent for differentiating malignant from randomly selected benign nodules, confirming the high potential of computer models for nodule risk estimation in population based screening studies. However, computer models highly rely on size as discriminator. Incorporation of other morphological criteria used by human observers to superiorly discriminate size-matched malignant from benign nodules, will further improve computer performance.
比较人类观察者与通过数学推导得出的计算机模型,以鉴别在基线筛查计算机断层扫描(CT)中检测到的恶性和良性肺结节。
采用病例队列研究设计。研究组包括来自丹麦肺癌筛查试验(DLCST)的300例胸部CT扫描。其中包括所有经证实为恶性肿瘤的扫描(n = 62)以及两个随机选择的基线扫描子集,分别为各种大小的良性结节(n = 120)以及与癌症大小匹配的结节(n = 118)。11名观察者和计算机模型(PanCan)为每个结节分配一个恶性概率评分。通过ROC曲线下面积(AUC)来表示性能。使用多夫曼、伯鲍姆和梅茨方法测试性能差异。7名观察者使用预定义列表评估结节的形态特征。使用经邦费罗尼校正的卡方分析来分析恶性结节与大小匹配的良性结节之间形态特征的差异。定义p < 0.004时有显著差异。
对于所有大小的恶性和良性结节的风险评估,模型和观察者的表现相当(AUC分别为0.932和0.910,p = 0.184)。然而,在区分恶性结节与大小匹配的良性结节方面,人类读者的表现优于计算机模型(AUC分别为0.819和0.706,p < 0.001)。观察者之间在ROC面积和风险评分范围方面存在很大差异。提示恶性的形态学表现涉及边界特征(毛刺征,p < 0.001)和结节周围结构变形(周围肺实质结构扭曲,p < 0.001;胸膜凹陷,p = 0.002)。
在区分恶性与随机选择的良性结节方面,计算机模型和人类观察者表现相当,这证实了计算机模型在基于人群的筛查研究中进行结节风险估计的巨大潜力。然而,计算机模型高度依赖大小作为判别指标。纳入人类观察者用于更好地区分大小匹配的恶性与良性结节的其他形态学标准,将进一步提高计算机的性能。