Kim Donguk, Lee Jong Hyuk, Jang Myoung-Jin, Park Jongsoo, Hong Wonju, Lee Chan Su, Yang Si Yeong, Park Chang Min
Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, 101, Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea.
Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, 101, Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea.
Bioengineering (Basel). 2023 Sep 12;10(9):1077. doi: 10.3390/bioengineering10091077.
Prior studies on models based on deep learning (DL) and measuring the cardiothoracic ratio (CTR) on chest radiographs have lacked rigorous agreement analyses with radiologists or reader tests. We validated the performance of a commercially available DL-based CTR measurement model with various thoracic pathologies, and performed agreement analyses with thoracic radiologists and reader tests using a probabilistic-based reference.
This study included 160 posteroanterior view chest radiographs (no lung or pleural abnormalities, pneumothorax, pleural effusion, consolidation, and = 40 in each category) to externally test a DL-based CTR measurement model. To assess the agreement between the model and experts, intraclass or interclass correlation coefficients (ICCs) were compared between the model and two thoracic radiologists. In the reader tests with a probabilistic-based reference standard (Dawid-Skene consensus), we compared diagnostic measures-including sensitivity and negative predictive value (NPV)-for cardiomegaly between the model and five other radiologists using the non-inferiority test.
For the 160 chest radiographs, the model measured a median CTR of 0.521 (interquartile range, 0.446-0.59) and a mean CTR of 0.522 ± 0.095. The ICC between the two thoracic radiologists and between the model and two thoracic radiologists was not significantly different (0.972 versus 0.959, = 0.192), even across various pathologies (all -values > 0.05). The model showed non-inferior diagnostic performance, including sensitivity (96.3% versus 97.8%) and NPV (95.6% versus 97.4%) ( < .001 in both), compared with the radiologists for all 160 chest radiographs. However, it showed inferior sensitivity in chest radiographs with consolidation (95.5% versus 99.9%; = 0.082) and NPV in chest radiographs with pleural effusion (92.9% versus 94.6%; = 0.079) and consolidation (94.1% versus 98.7%; = 0.173).
While the sensitivity and NPV of this model for diagnosing cardiomegaly in chest radiographs with consolidation or pleural effusion were not as high as those of the radiologists, it demonstrated good agreement with the thoracic radiologists in measuring the CTR across various pathologies.
先前基于深度学习(DL)并在胸部X光片上测量心胸比率(CTR)的模型研究缺乏与放射科医生的严格一致性分析或读者测试。我们验证了一种市售的基于DL的CTR测量模型在各种胸部病变中的性能,并与胸部放射科医生进行了一致性分析,并使用基于概率的参考标准进行了读者测试。
本研究纳入160张后前位胸部X光片(无肺部或胸膜异常、气胸、胸腔积液、实变,各类型n = 40),以外部测试基于DL的CTR测量模型。为评估模型与专家之间的一致性,比较了模型与两名胸部放射科医生之间的组内或组间相关系数(ICC)。在使用基于概率的参考标准(Dawid-Skene共识)的读者测试中,我们使用非劣效性检验比较了模型与其他五名放射科医生之间关于心脏扩大的诊断指标,包括敏感性和阴性预测值(NPV)。
对于160张胸部X光片,模型测得的CTR中位数为0.521(四分位间距,0.446 - 0.59),平均CTR为0.522±0.095。两名胸部放射科医生之间以及模型与两名胸部放射科医生之间的ICC无显著差异(0.972对0.959,P = 0.192),即使在各种病变类型中(所有P值>0.05)。对于所有160张胸部X光片,与放射科医生相比,该模型显示出非劣效的诊断性能,包括敏感性(96.3%对97.8%)和NPV(95.6%对97.4%)(两者P < .001)。然而,在有实变的胸部X光片中它显示出较低的敏感性(95.5%对99.9%;P = 0.082),在有胸腔积液的胸部X光片中NPV较低(92.9%对94.6%;P = 0.079),在有实变的胸部X光片中NPV也较低(94.1%对98.7%;P = 0.173)。
虽然该模型在诊断有实变或胸腔积液的胸部X光片中心脏扩大的敏感性和NPV不如放射科医生,但在测量各种病变类型的CTR方面与胸部放射科医生显示出良好的一致性。