NLP department, HT Médica, Carmelo Torres n 2, 23007, Jaén, Spain.
MRI unit, Radiology department, HT Médica, Carmelo Torres n 2, 23007, Jaén, Spain.
Comput Methods Programs Biomed. 2024 Oct;255:108334. doi: 10.1016/j.cmpb.2024.108334. Epub 2024 Jul 20.
In the last decade, there has been a growing interest in applying artificial intelligence (AI) systems to breast cancer assessment, including breast density evaluation. However, few models have been developed to integrate textual mammographic reports and mammographic images. Our aims are (1) to generate a natural language processing (NLP)-based AI system, (2) to evaluate an external image-based software, and (3) to develop a multimodal system, using the late fusion approach, by integrating image and text inferences for the automatic classification of breast density according to the American College of Radiology (ACR) guidelines in mammograms and radiological reports.
We first compared different NLP models, three based on n-gram term frequency - inverse document frequency and two transformer-based architectures, using 1533 unstructured mammogram reports as a training set and 303 reports as a test set. Subsequently, we evaluated an external image-based software using 303 mammogram images. Finally, we assessed our multimodal system taking into account both text and mammogram images.
Our best NLP model achieved 88 % accuracy, while the external software and the multimodal system achieved 75 % and 80 % accuracy, respectively, in classifying ACR breast densities.
Although our multimodal system outperforms the image-based tool, it currently does not improve the results offered by the NLP model for ACR breast density classification. Nevertheless, the promising results observed here open the possibility to more comprehensive studies regarding the utilization of multimodal tools in the assessment of breast density.
在过去十年中,人们对应用人工智能(AI)系统进行乳腺癌评估,包括乳腺密度评估,产生了越来越大的兴趣。然而,开发的模型很少能够整合文本乳腺摄影报告和乳腺图像。我们的目标是:(1)生成基于自然语言处理(NLP)的 AI 系统,(2)评估基于图像的外部软件,以及(3)通过集成图像和文本推断,使用晚期融合方法开发多模态系统,根据乳腺摄影和放射学报告中的美国放射学院(ACR)指南,对乳腺密度进行自动分类。
我们首先使用 1533 份非结构化乳腺摄影报告作为训练集和 303 份报告作为测试集,比较了三种基于 n 元词频-逆文档频率的 NLP 模型和两种基于转换器的架构。随后,我们使用 303 份乳腺摄影图像评估了基于图像的外部软件。最后,我们考虑了文本和乳腺摄影图像,评估了我们的多模态系统。
我们最好的 NLP 模型的准确率为 88%,而基于图像的软件和多模态系统的准确率分别为 75%和 80%,用于分类 ACR 乳腺密度。
尽管我们的多模态系统优于基于图像的工具,但它目前并不能提高 NLP 模型在 ACR 乳腺密度分类方面的结果。尽管如此,这里观察到的有希望的结果为在评估乳腺密度中使用多模态工具进行更全面的研究开辟了可能性。