Abdullahi Abubakar Ahmad, Ganiz Murat Can, Koç Ural, Gökhan Muhammet Batuhan, Aydın Ceren, Özdemir Ali Bahadır
Marmara University Faculty of Engineering, Department of Computer Engineering, İstanbul, Türkiye.
Ankara Bilkent City Hospital, Clinic of Radiology, Ankara, Türkiye.
Diagn Interv Radiol. 2025 Feb 28. doi: 10.4274/dir.2025.243100.
The primary objective of this research is to enhance the accuracy and efficiency of information extraction from radiology reports. In addressing this objective, the study aims to develop and evaluate a deep learning framework for named entity recognition (NER).
We used a synthetic dataset of 1,056 Turkish radiology reports created and labeled by the radiologists in our research team. Due to privacy concerns, actual patient data could not be used; however, the synthetic reports closely mimic genuine reports in structure and content. We employed the four-stage DYGIE++ model for the experiments. First, we performed token encoding using four bidirectional encoder representations from transformers (BERT) models: BERTurk, BioBERTurk, PubMedBERT, and XLM-RoBERTa. Second, we introduced adaptive span enumeration, considering the word count of a sentence in Turkish. Third, we adopted span graph propagation to generate a multidirectional graph crucial for coreference resolution. Finally, we used a two-layered feed-forward neural network to classify the named entity.
The experiments conducted on the labeled dataset showcase the approach's effectiveness. The study achieved an F1 score of 80.1 for the NER task, with the BioBERTurk model, which is pre-trained on Turkish Wikipedia, radiology reports, and biomedical texts, proving to be the most effective of the four BERT models used in the experiment.
We show how different dataset labels affect the model's performance. The results demonstrate the model's ability to handle the intricacies of Turkish radiology reports, providing a detailed analysis of precision, recall, and F1 scores for each label. Additionally, this study compares its findings with related research in other languages.
Our approach provides clinicians with more precise and comprehensive insights to improve patient care by extracting relevant information from radiology reports. This innovation in information extraction streamlines the diagnostic process and helps expedite patient treatment decisions.
本研究的主要目标是提高从放射学报告中提取信息的准确性和效率。为实现这一目标,该研究旨在开发并评估一种用于命名实体识别(NER)的深度学习框架。
我们使用了由我们研究团队的放射科医生创建并标注的1056份土耳其语放射学报告的合成数据集。出于隐私考虑,无法使用实际患者数据;然而,合成报告在结构和内容上紧密模仿真实报告。我们采用四阶段的DYGIE++模型进行实验。首先,我们使用来自四种基于变换器的双向编码器表征(BERT)模型进行词元编码:BERTurk、BioBERTurk、PubMedBERT和XLM-RoBERTa。其次,考虑到土耳其语句子的单词数量,我们引入了自适应跨度枚举。第三,我们采用跨度图传播来生成对共指消解至关重要的多向图。最后,我们使用两层前馈神经网络对命名实体进行分类。
在标注数据集上进行的实验展示了该方法的有效性。该研究在NER任务中取得了80.1的F1分数,其中在土耳其语维基百科、放射学报告和生物医学文本上预训练的BioBERTurk模型被证明是实验中使用的四个BERT模型中最有效的。
我们展示了不同的数据集标签如何影响模型的性能。结果证明了该模型处理土耳其语放射学报告复杂性的能力,为每个标签提供了精确率、召回率和F1分数的详细分析。此外,本研究将其结果与其他语言的相关研究进行了比较。
我们的方法通过从放射学报告中提取相关信息,为临床医生提供更精确和全面的见解,以改善患者护理。这种信息提取方面的创新简化了诊断过程,并有助于加快患者治疗决策。