Suppr超能文献

使用预训练语言模型对结直肠癌放射学报告进行自动TNM分期

Automatic TNM staging of colorectal cancer radiology reports using pre-trained language models.

作者信息

Chizhikova Mariia, López-Úbeda Pilar, Martín-Noguerol Teodoro, Díaz-Galiano Manuel C, Ureña-López L Alfonso, Luna Antonio, Martín-Valdivia M Teresa

机构信息

Department of Computer Science, Advanced Studies Center in ICT (CEATIC), University of Jaén, Campus las Lagunillas, s/n, Jaén, 23071, Spain.

Natural Language Processing Unit, HT médica, Carmelo Torres, n°2, Jaén, 23007, Spain.

出版信息

Comput Methods Programs Biomed. 2025 Feb;259:108515. doi: 10.1016/j.cmpb.2024.108515. Epub 2024 Nov 16.

Abstract

BACKGROUND AND OBJECTIVE

Colorectal cancer is one of the major causes of cancer death worldwide. Essential for prognosis and treatment planning, TNM staging offers critical insights into the advancement of colorectal cancer. However, manual TNM staging from colon magnetic resonance imaging (MRI) is a laborious and error prone process. This study introduces an automated text classification system for TNM staging of colon MRI images in Spanish.

METHODS

A dataset of 1319 Spanish colon MRI reports was collected and manually labeled with TNM staging. In order to automate the task of TNM staging, a multimodal system was proposed. The system is based on RoBERTa language model pre-trained on a combination of biomedical and clinical Spanish language corpora and uses Natural Language Processing (NLP) techniques to extract relevant categorical and numerical features from MRI reports.

RESULTS

The performance of the system was evaluated using different metrics and the results obtained are very promising: the best performance among the proposed systems reached 0.7464, 0.8792 and 0.6776 of macro F1-score for T, N and M respectively.

CONCLUSIONS

This study demonstrates the feasibility of using a language model for automatic TNM staging based on Spanish clinical reports of colorectal cancer patients. The proposed system can be a useful tool to improve the efficiency and accuracy of colorectal cancer diagnosis.

摘要

背景与目的

结直肠癌是全球癌症死亡的主要原因之一。TNM分期对于预后和治疗规划至关重要,它能为结直肠癌的进展提供关键见解。然而,从结肠磁共振成像(MRI)进行手动TNM分期是一个费力且容易出错的过程。本研究介绍了一种用于西班牙文结肠MRI图像TNM分期的自动文本分类系统。

方法

收集了1319份西班牙文结肠MRI报告的数据集,并手动标注了TNM分期。为了实现TNM分期任务的自动化,提出了一个多模态系统。该系统基于在生物医学和临床西班牙语文本语料库组合上预训练的RoBERTa语言模型,并使用自然语言处理(NLP)技术从MRI报告中提取相关的分类和数值特征。

结果

使用不同指标对系统性能进行了评估,获得的结果非常有前景:在所提出的系统中,最佳性能分别在T、N和M的宏观F1分数上达到了0.7464、0.8792和0.6776。

结论

本研究证明了使用语言模型根据结直肠癌患者的西班牙文临床报告进行自动TNM分期的可行性。所提出的系统可以成为提高结直肠癌诊断效率和准确性的有用工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验