Department of Diagnostic and Interventional Radiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany.
Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Sankt Augustin, Germany.
Eur Radiol. 2023 Jun;33(6):4228-4236. doi: 10.1007/s00330-023-09526-y. Epub 2023 Mar 11.
To provide insights for on-site development of transformer-based structuring of free-text report databases by investigating different labeling and pre-training strategies.
A total of 93,368 German chest X-ray reports from 20,912 intensive care unit (ICU) patients were included. Two labeling strategies were investigated to tag six findings of the attending radiologist. First, a system based on human-defined rules was applied for annotation of all reports (termed "silver labels"). Second, 18,000 reports were manually annotated in 197 h (termed "gold labels") of which 10% were used for testing. An on-site pre-trained model (T) using masked-language modeling (MLM) was compared to a public, medically pre-trained model (T). Both models were fine-tuned on silver labels only, gold labels only, and first with silver and then gold labels (hybrid training) for text classification, using varying numbers (N: 500, 1000, 2000, 3500, 7000, 14,580) of gold labels. Macro-averaged F1-scores (MAF1) in percent were calculated with 95% confidence intervals (CI).
T (95.5 [94.5-96.3]) showed significantly higher MAF1 than T (75.0 [73.4-76.5]) and T (75.2 [73.6-76.7]), but not significantly higher MAF1 than T (94.7 [93.6-95.6]), T (94.9 [93.9-95.8]), and T (95.2 [94.3-96.0]). When using 7000 or less gold-labeled reports, T (N: 7000, 94.7 [93.5-95.7]) showed significantly higher MAF1 than T (N: 7000, 91.5 [90.0-92.8]). With at least 2000 gold-labeled reports, utilizing silver labels did not lead to significant improvement of T (N: 2000, 91.8 [90.4-93.2]) over T (N: 2000, 91.4 [89.9-92.8]).
Custom pre-training of transformers and fine-tuning on manual annotations promises to be an efficient strategy to unlock report databases for data-driven medicine.
• On-site development of natural language processing methods that retrospectively unlock free-text databases of radiology clinics for data-driven medicine is of great interest. • For clinics seeking to develop methods on-site for retrospective structuring of a report database of a certain department, it remains unclear which of previously proposed strategies for labeling reports and pre-training models is the most appropriate in context of, e.g., available annotator time. • Using a custom pre-trained transformer model, along with a little annotation effort, promises to be an efficient way to retrospectively structure radiological databases, even if not millions of reports are available for pre-training.
通过研究不同的标注和预训练策略,为基于变压器的自由文本报告数据库的现场开发提供思路。
共纳入了 20912 例重症监护病房(ICU)患者的 93368 份德国胸部 X 光报告。研究了两种标注策略,以标记主治放射科医生的六种发现。首先,应用基于人工定义规则的系统对所有报告进行标注(称为“银标签”)。其次,手动标注了 18000 份报告(称为“金标签”),用时 197 小时,其中 10%用于测试。使用掩蔽语言模型(MLM)的现场预训练模型(T)与一个公共的、医学预训练模型(T)进行了比较。两个模型都仅使用银标签、金标签、首先使用银标签然后使用金标签(混合训练)进行文本分类,使用不同数量(N:500、1000、2000、3500、7000、14580)的金标签进行微调。使用 95%置信区间(CI)计算了宏平均 F1 分数(MAF1)的百分比。
T(95.5[94.5-96.3])的 MAF1 显著高于 T(75.0[73.4-76.5])和 T(75.2[73.6-76.7]),但与 T(94.7[93.6-95.6])、T(94.9[93.9-95.8])和 T(95.2[94.3-96.0])的 MAF1 没有显著差异。当使用 7000 份或更少的金标签报告时,T(N:7000,94.7[93.5-95.7])的 MAF1 显著高于 T(N:7000,91.5[90.0-92.8])。使用至少 2000 份金标签报告时,使用银标签并不会显著提高 T(N:2000,91.8[90.4-93.2])优于 T(N:2000,91.4[89.9-92.8])。
针对变压器的定制预训练和针对手动标注的微调有望成为一种有效的策略,用于为数据驱动医学解锁放射科报告数据库。
开发用于为放射科诊所的自由文本数据库解锁用于数据驱动医学的自然语言处理方法的现场开发很有意义。
对于那些希望在现场开发特定部门报告数据库结构化方法的诊所来说,在可用注释时间等方面,哪种以前提出的报告标注和预训练模型策略最合适仍不清楚。
使用定制的预训练变压器模型,加上少量的标注工作,有望成为一种高效的方法,可以回顾性地构建放射学数据库,即使没有数百万份报告可供预训练。