From the ARTORG Center for Biomedical Engineering Research, University of Bern.
Department of Diagnostic, Interventional and Pediatric Radiology, Inselspital.
Invest Radiol. 2021 Jun 1;56(6):348-356. doi: 10.1097/RLI.0000000000000748.
Five publicly available databases comprising normal CXR, confirmed COVID-19 pneumonia cases, and other pneumonias were used. After the harmonization of the data, the training set included 7966 normal cases, 5451 with other pneumonia, and 258 CXRs with COVID-19 pneumonia, whereas in the testing data set, each category was represented by 100 cases. Eleven blinded radiologists with various levels of expertise independently read the testing data set. The data were analyzed separately with the newly proposed artificial intelligence-based system and by consultant radiologists and residents, with respect to positive predictive value (PPV), sensitivity, and F-score (harmonic mean for PPV and sensitivity). The χ2 test was used to compare the sensitivity, specificity, accuracy, PPV, and F-scores of the readers and the system.
The proposed system achieved higher overall diagnostic accuracy (94.3%) than the radiologists (61.4% ± 5.3%). The radiologists reached average sensitivities for normal CXR, other type of pneumonia, and COVID-19 pneumonia of 85.0% ± 12.8%, 60.1% ± 12.2%, and 53.2% ± 11.2%, respectively, which were significantly lower than the results achieved by the algorithm (98.0%, 88.0%, and 97.0%; P < 0.00032). The mean PPVs for all 11 radiologists for the 3 categories were 82.4%, 59.0%, and 59.0% for the healthy, other pneumonia, and COVID-19 pneumonia, respectively, resulting in an F-score of 65.5% ± 12.4%, which was significantly lower than the F-score of the algorithm (94.3% ± 2.0%, P < 0.00001). When other pneumonia and COVID-19 pneumonia cases were pooled, the proposed system reached an accuracy of 95.7% for any pathology and the radiologists, 88.8%. The overall accuracy of consultants did not vary significantly compared with residents (65.0% ± 5.8% vs 67.4% ± 4.2%); however, consultants detected significantly more COVID-19 pneumonia cases (P = 0.008) and less healthy cases (P < 0.00001).
The system showed robust accuracy for COVID-19 pneumonia detection on CXR and surpassed radiologists at various training levels.
使用了五个公开数据库,包括正常 CXR、确诊的 COVID-19 肺炎病例和其他肺炎。在数据协调后,训练集包括 7966 例正常病例、5451 例其他肺炎病例和 258 例 COVID-19 肺炎 CXR 病例,而在测试数据集中,每个类别均有 100 例病例。11 位具有不同专业水平的盲法放射科医生独立阅读测试数据集。使用新提出的基于人工智能的系统和顾问放射科医生和住院医师分别分析数据,以评估阳性预测值(PPV)、灵敏度和 F 分数(PPV 和灵敏度的调和平均值)。使用 χ2 检验比较读者和系统的灵敏度、特异性、准确性、PPV 和 F 分数。
提出的系统在整体诊断准确性(94.3%)方面优于放射科医生(61.4%±5.3%)。放射科医生对正常 CXR、其他类型肺炎和 COVID-19 肺炎的平均灵敏度分别为 85.0%±12.8%、60.1%±12.2%和 53.2%±11.2%,显著低于算法的结果(98.0%、88.0%和 97.0%;P<0.00032)。11 位放射科医生对所有 3 个类别分别为健康、其他肺炎和 COVID-19 肺炎的平均 PPV 分别为 82.4%、59.0%和 59.0%,F 分数为 65.5%±12.4%,显著低于算法的 F 分数(94.3%±2.0%,P<0.00001)。当将其他肺炎和 COVID-19 肺炎病例合并时,提出的系统对任何病理学的准确率达到 95.7%,放射科医生为 88.8%。顾问的整体准确率与住院医师相比没有显著差异(65.0%±5.8% vs 67.4%±4.2%);然而,顾问检测到的 COVID-19 肺炎病例明显更多(P=0.008),健康病例明显更少(P<0.00001)。
该系统在 COVID-19 肺炎的 CXR 检测中表现出强大的准确性,并超越了各个培训水平的放射科医生。