School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
J Am Med Inform Assoc. 2022 Apr 13;29(5):831-840. doi: 10.1093/jamia/ocac007.
Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision.
We focused on identifying imaging findings for 3 common causes of malpractice claims: (1) potentially malignant breast (mammography) and (2) lung (chest computed tomography [CT]) lesions and (3) long-bone fracture (X-ray) reports. We train our ClinicalBERT-based pipeline on existing typed/dictated reports classified manually or using ICD-10 codes, evaluate using a test set of manually classified SDs, and compare against string-matching (baseline approach).
A total of 393 mammograms, 305 chest CT, and 683 bone X-ray reports were manually reviewed. The string-matching approach had an F1 of 0.667. For mammograms, chest CTs, and bone X-rays, respectively: models trained on manually classified training data and optimized for F1 reached an F1 of 0.900, 0.905, and 0.817, while separate models optimized for recall achieved a recall of 1.000 with precisions of 0.727, 0.518, and 0.275. Models trained on ICD-10-labelled data and optimized for F1 achieved F1 scores of 0.647, 0.830, and 0.643, while those optimized for recall achieved a recall of 1.0 with precisions of 0.407, 0.683, and 0.358.
Our pipeline can identify abnormal reports with potentially useful performance and so decrease the manual effort required to screen for abnormal findings that require follow-up.
It is possible to automatically identify clinically significant abnormalities in SDs with high recall and practically useful precision in a generalizable and minimally laborious way.
扫描文档(SD)在电子健康记录中很常见,并且可能包含丰富的临床相关信息,但很少与临床医生的工作流程相匹配。在这里,我们确定了需要高召回率和实用精度进行随访的扫描成像报告。
我们专注于识别三种常见医疗事故索赔原因的成像结果:(1)潜在恶性乳腺(乳房 X 光片)和(2)肺(胸部 CT)病变以及(3)长骨骨折(X 光)报告。我们使用现有的手动分类或使用 ICD-10 代码分类的已键入/已口述报告来训练基于 ClinicalBERT 的管道,使用手动分类的 SD 测试集进行评估,并与字符串匹配(基线方法)进行比较。
总共审查了 393 张乳房 X 光片、305 张胸部 CT 和 683 张骨 X 射线报告。字符串匹配方法的 F1 值为 0.667。对于乳房 X 光片、胸部 CT 和骨 X 射线,分别使用手动分类训练数据训练并针对 F1 进行优化的模型达到了 0.900、0.905 和 0.817 的 F1 值,而单独针对召回率进行优化的模型则达到了 1.000 的召回率和 0.727、0.518 和 0.275 的精度。使用 ICD-10 标记数据训练并针对 F1 进行优化的模型达到了 0.647、0.830 和 0.643 的 F1 值,而针对召回率进行优化的模型则达到了 1.0 的召回率和 0.407、0.683 和 0.358 的精度。
我们的管道可以识别具有潜在有用性能的异常报告,从而减少筛选需要随访的异常发现所需的人工工作。
可以以可推广和最小化劳动的方式自动识别 SD 中的具有高召回率和实用精度的临床显著异常。