Fervers Philipp, Fervers Florian, Jaiswal Astha, Rinneburger Miriam, Weisthoff Mathilda, Pollmann-Schweckhorst Philip, Kottlors Jonathan, Carolus Heike, Lennartz Simon, Maintz David, Shahzad Rahil, Persigehl Thorsten
Department of Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University Cologne, Cologne, Germany.
Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, Germany.
Quant Imaging Med Surg. 2022 Nov;12(11):5156-5170. doi: 10.21037/qims-22-175.
The extent of lung involvement in coronavirus disease 2019 (COVID-19) pneumonia, quantified on computed tomography (CT), is an established biomarker for prognosis and guides clinical decision-making. The clinical standard is semi-quantitative scoring of lung involvement by an experienced reader. We aim to compare the performance of automated deep-learning- and threshold-based methods to the manual semi-quantitative lung scoring. Further, we aim to investigate an optimal threshold for quantification of involved lung in COVID pneumonia chest CT, using a multi-center dataset.
In total 250 patients were included, 50 consecutive patients with RT-PCR confirmed COVID-19 from our local institutional database, and another 200 patients from four international datasets (n=50 each). Lung involvement was scored semi-quantitatively by three experienced radiologists according to the established chest CT score (CCS) ranging from 0-25. Inter-rater reliability was reported by the intraclass correlation coefficient (ICC). Deep-learning-based segmentation of ground-glass and consolidation was obtained by CT Pulmo Auto Results prototype plugin on IntelliSpace Discovery (Philips Healthcare, The Netherlands). Threshold-based segmentation of involved lung was implemented using an open-source tool for whole-lung segmentation under the presence of severe pathologies (R231CovidWeb, Hofmanninger , 2020) and consecutive quantitative assessment of lung attenuation. The best threshold was investigated by training and testing a linear regression of deep-learning and threshold-based results in a five-fold cross validation strategy.
Median CCS among 250 evaluated patients was 10 [6-15]. Inter-rater reliability of the CCS was excellent [ICC 0.97 (0.97-0.98)]. Best attenuation threshold for identification of involved lung was -522 HU. While the relationship of deep-learning- and threshold-based quantification was linear and strong (r =0.84), both automated quantification methods translated to the semi-quantitative CCS in a non-linear fashion, with an increasing slope towards higher CCS (r = 0.80, r =0.63).
The manual semi-quantitative CCS underestimates the extent of COVID pneumonia in higher score ranges, which limits its clinical usefulness in cases of severe disease. Clinical implementation of fully automated methods, such as deep-learning or threshold-based approaches (best threshold in our multi-center dataset: -522 HU), might save time of trained personnel, abolish inter-reader variability, and allow for truly quantitative, linear assessment of COVID lung involvement.
2019冠状病毒病(COVID-19)肺炎的肺部受累程度通过计算机断层扫描(CT)进行量化,是一种既定的预后生物标志物,并指导临床决策。临床标准是由经验丰富的阅片者对肺部受累情况进行半定量评分。我们旨在比较基于深度学习和阈值的自动化方法与手动半定量肺部评分的性能。此外,我们旨在使用多中心数据集研究COVID肺炎胸部CT中受累肺脏量化的最佳阈值。
总共纳入250例患者,其中50例为来自我们本地机构数据库的经逆转录聚合酶链反应(RT-PCR)确诊的COVID-19患者,另外200例来自四个国际数据集(每个数据集50例)。由三名经验丰富的放射科医生根据既定的胸部CT评分(CCS)从0至25对肺部受累情况进行半定量评分。组内相关系数(ICC)报告了阅片者间的可靠性。基于深度学习的磨玻璃影和实变分割通过IntelliSpace Discovery(飞利浦医疗,荷兰)上的CT Pulmo Auto Results原型插件获得。使用用于严重病变情况下全肺分割的开源工具(R231CovidWeb,Hofmanninger,2020)以及肺衰减的连续定量评估来实施基于阈值的受累肺脏分割。通过在五折交叉验证策略中训练和测试深度学习和基于阈值的结果的线性回归来研究最佳阈值。
250例评估患者的CCS中位数为10[6-15]。CCS的阅片者间可靠性极佳[ICC 0.97(0.97-0.98)]。识别受累肺脏的最佳衰减阈值为-522 HU。虽然基于深度学习和基于阈值的量化之间的关系是线性且紧密的(r =0.84),但两种自动化量化方法均以非线性方式转化为半定量CCS,随着CCS升高斜率增加(r = 0.80,r =0.63)。
手动半定量CCS在较高评分范围内低估了COVID肺炎的程度,这限制了其在重症病例中的临床实用性。深度学习或基于阈值的方法(我们多中心数据集中的最佳阈值:-522 HU)等全自动方法的临床应用可能会节省训练有素人员的时间,消除阅片者间的变异性,并允许对COVID肺部受累情况进行真正的定量、线性评估。