Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.
Division of Thoracic Imaging and Intervention, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.
Acad Radiol. 2021 Apr;28(4):572-576. doi: 10.1016/j.acra.2021.01.016. Epub 2021 Jan 18.
Radiographic findings of COVID-19 pneumonia can be used for patient risk stratification; however, radiologist reporting of disease severity is inconsistent on chest radiographs (CXRs). We aimed to see if an artificial intelligence (AI) system could help improve radiologist interrater agreement.
We performed a retrospective multi-radiologist user study to evaluate the impact of an AI system, the PXS score model, on the grading of categorical COVID-19 lung disease severity on 154 chest radiographs into four ordinal grades (normal/minimal, mild, moderate, and severe). Four radiologists (two thoracic and two emergency radiologists) independently interpreted 154 CXRs from 154 unique patients with COVID-19 hospitalized at a large academic center, before and after using the AI system (median washout time interval was 16 days). Three different thoracic radiologists assessed the same 154 CXRs using an updated version of the AI system trained on more imaging data. Radiologist interrater agreement was evaluated using Cohen and Fleiss kappa where appropriate. The lung disease severity categories were associated with clinical outcomes using a previously published outcomes dataset using Fisher's exact test and Chi-square test for trend.
Use of the AI system improved radiologist interrater agreement (Fleiss κ = 0.40 to 0.66, before and after use of the system). The Fleiss κ for three radiologists using the updated AI system was 0.74. Severity categories were significantly associated with subsequent intubation or death within 3 days.
An AI system used at the time of CXR study interpretation can improve the interrater agreement of radiologists.
COVID-19 肺炎的放射学表现可用于患者风险分层;然而,放射科医生对疾病严重程度的报告在胸部 X 线摄影(CXR)上不一致。我们旨在研究人工智能(AI)系统是否有助于提高放射科医生的组内一致性。
我们进行了一项回顾性多放射科医生用户研究,以评估 AI 系统(PXS 评分模型)对将 154 例 COVID-19 住院患者的 154 张胸部 X 线片的分类 COVID-19 肺部疾病严重程度分为四个有序等级(正常/最小、轻度、中度和重度)的影响。四位放射科医生(两位胸部放射科医生和两位急诊放射科医生)在使用 AI 系统前后(中位洗脱时间间隔为 16 天),分别独立解读了来自一家大型学术中心的 154 例 COVID-19 患者的 154 张 CXR。三位不同的胸部放射科医生使用经过更多影像学数据训练的更新版 AI 系统评估了相同的 154 张 CXR。使用 Cohen 和 Fleiss kappa 评估放射科医生的组内一致性,在适当的情况下。使用以前发表的使用 Fisher 确切检验和趋势卡方检验的结果数据集,将肺部疾病严重程度与临床结果相关联。
使用 AI 系统提高了放射科医生的组内一致性(Fleiss κ 值从使用系统前后的 0.40 到 0.66)。三位放射科医生使用更新的 AI 系统的 Fleiss κ 值为 0.74。严重程度类别与 3 天内随后插管或死亡有显著相关性。
在 CXR 研究解读时使用 AI 系统可以提高放射科医生的组内一致性。