Jaiswal Ajay, Tang Liyan, Ghosh Meheli, Rousseau Justin F, Peng Yifan, Ding Ying
The University of Texas at Austin, United States.
Central University of Gujarat, India.
Proc Mach Learn Res. 2021 Dec;158:196-208.
Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements. Extracting pathologic findings and diagnoses from radiology reports is important for quality control, population health, and monitoring of disease progress. Existing works, primarily rely either on rule-based systems or transformer-based pre-trained model fine-tuning, but could not take the factual and uncertain information into consideration, and therefore generate false positive outputs. In this work, we introduce three sedulous augmentation techniques which retain factual and critical information while generating augmentations for contrastive learning. We introduce RadBERT-CL, which fuses these information into BlueBert via a self-supervised contrastive loss. Our experiments on MIMIC-CXR show superior performance of RadBERT-CL on fine-tuning for multi-class, multi-label report classification. We illustrate that when few labeled data are available, RadBERT-CL outperforms conventional SOTA transformers (BERT/BlueBert) by significantly larger margins (6-11%). We also show that the representations learned by RadBERT-CL can capture critical medical information in the latent space.
放射学报告是非结构化的,包含放射科医生转录的影像学发现和相应诊断,其中包括临床事实以及否定和/或不确定的陈述。从放射学报告中提取病理发现和诊断对于质量控制、人群健康以及疾病进展监测至关重要。现有工作主要依赖基于规则的系统或基于Transformer的预训练模型微调,但无法考虑事实性和不确定性信息,因此会产生误报。在这项工作中,我们引入了三种精心设计的增强技术,这些技术在为对比学习生成增强时保留了事实性和关键信息。我们引入了RadBERT-CL,它通过自监督对比损失将这些信息融合到BlueBert中。我们在MIMIC-CXR上的实验表明,RadBERT-CL在多类、多标签报告分类的微调方面具有卓越性能。我们证明,当可用的标记数据很少时,RadBERT-CL比传统的最优Transformer(BERT/BlueBert)有显著更大的优势(6-11%)。我们还表明,RadBERT-CL学习到的表示可以在潜在空间中捕获关键医学信息。