• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于医学报告生成的辅助信号引导知识编码器-解码器

Auxiliary signal-guided knowledge encoder-decoder for medical report generation.

作者信息

Li Mingjie, Liu Rui, Wang Fuyu, Chang Xiaojun, Liang Xiaodan

机构信息

University of Technology Sydney, Sydney, Australia.

Monash University, Melbourne, Australia.

出版信息

World Wide Web. 2023;26(1):253-270. doi: 10.1007/s11280-022-01013-6. Epub 2022 Aug 27.

DOI:10.1007/s11280-022-01013-6
PMID:36060430
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9417931/
Abstract

Medical reports have significant clinical value to radiologists and specialists, especially during a pandemic like COVID. However, beyond the common difficulties faced in the natural image captioning, medical report generation specifically requires the model to describe a medical image with a fine-grained and semantic-coherence paragraph that should satisfy both medical commonsense and logic. Previous works generally extract the global image features and attempt to generate a paragraph that is similar to referenced reports; however, this approach has two limitations. Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure. Secondly, there are many similar sentences used in each medical report to describe the normal regions of the image, which causes serious data bias. This deviation is likely to teach models to generate these inessential sentences on a regular basis. To address these problems, we propose an Auxiliary Signal-Guided Knowledge Encoder-Decoder (ASGK) to mimic radiologists' working patterns. Specifically, the auxiliary patches are explored to expand the widely used visual patch features before fed to the Transformer encoder, while the external linguistic signals help the decoder better master prior knowledge during the pre-training process. Our approach performs well on common benchmarks, including CX-CHR, IU X-Ray, and COVID-19 CT Report dataset (COV-CTR), demonstrating combining auxiliary signals with transformer architecture can bring a significant improvement in terms of medical report generation. The experimental results confirm that auxiliary signals driven Transformer-based models are with solid capabilities to outperform previous approaches on both medical terminology classification and paragraph generation metrics.

摘要

医学报告对放射科医生和专家具有重要的临床价值,尤其是在像新冠疫情这样的大流行期间。然而,除了自然图像字幕中面临的常见困难之外,医学报告生成特别要求模型用一个细粒度且语义连贯的段落来描述医学图像,该段落应同时满足医学常识和逻辑。先前的工作通常提取全局图像特征,并试图生成一个与参考报告相似的段落;然而,这种方法有两个局限性。首先,放射科医生主要关注的区域通常位于全局图像的一小部分,这意味着图像的其余部分在训练过程中可能被视为无关噪声。其次,每份医学报告中都有许多相似的句子用于描述图像的正常区域,这会导致严重的数据偏差。这种偏差很可能会使模型经常生成这些无关紧要的句子。为了解决这些问题,我们提出了一种辅助信号引导的知识编码器 - 解码器(ASGK)来模仿放射科医生的工作模式。具体来说,在将广泛使用的视觉补丁特征输入到Transformer编码器之前,探索辅助补丁以扩展这些特征,而外部语言信号有助于解码器在预训练过程中更好地掌握先验知识。我们的方法在包括CX-CHR、IU X射线和COVID-19 CT报告数据集(COV-CTR)等常见基准上表现良好,表明将辅助信号与Transformer架构相结合可以在医学报告生成方面带来显著改进。实验结果证实,基于辅助信号驱动的Transformer模型在医学术语分类和段落生成指标方面都具有强大的能力,优于先前的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/696e28da6f85/11280_2022_1013_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/d53478b45a9f/11280_2022_1013_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/9f521a654792/11280_2022_1013_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/be7cbfa5d083/11280_2022_1013_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/b30ffc4fcd4f/11280_2022_1013_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/696e28da6f85/11280_2022_1013_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/d53478b45a9f/11280_2022_1013_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/9f521a654792/11280_2022_1013_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/be7cbfa5d083/11280_2022_1013_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/b30ffc4fcd4f/11280_2022_1013_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a2/9417931/696e28da6f85/11280_2022_1013_Fig5_HTML.jpg

相似文献

1
Auxiliary signal-guided knowledge encoder-decoder for medical report generation.用于医学报告生成的辅助信号引导知识编码器-解码器
World Wide Web. 2023;26(1):253-270. doi: 10.1007/s11280-022-01013-6. Epub 2022 Aug 27.
2
Multi-modal transformer architecture for medical image analysis and automated report generation.多模态转换器架构在医学图像分析和自动报告生成中的应用。
Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5.
3
Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。
Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.
4
Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告:自适应多级多关注方法。
Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.
5
Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach.基于 Transformer 的全自动放射报告生成:一种多准则监督方法。
IEEE Trans Med Imaging. 2022 Oct;41(10):2803-2813. doi: 10.1109/TMI.2022.3171661. Epub 2022 Sep 30.
6
A label information fused medical image report generation framework.一种融合标签信息的医学图像报告生成框架。
Artif Intell Med. 2024 Apr;150:102823. doi: 10.1016/j.artmed.2024.102823. Epub 2024 Feb 22.
7
Knowledge matters: Chest radiology report generation with general and specific knowledge.知识很重要:使用通用和特定知识生成胸部放射学报告。
Med Image Anal. 2022 Aug;80:102510. doi: 10.1016/j.media.2022.102510. Epub 2022 Jun 9.
8
Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation.用于通用医学报告生成的对比预训练和基于线性交互注意力的变压器
J Biomed Inform. 2023 Feb;138:104281. doi: 10.1016/j.jbi.2023.104281. Epub 2023 Jan 10.
9
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning.医学-VLBERT:用于通过交替学习生成COVID-19 CT报告的医学视觉语言BERT
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):3786-3797. doi: 10.1109/TNNLS.2021.3099165. Epub 2021 Aug 31.
10
Style-Enhanced Transformer for Image Captioning in Construction Scenes.用于建筑场景图像字幕的风格增强Transformer
Entropy (Basel). 2024 Mar 1;26(3):224. doi: 10.3390/e26030224.

引用本文的文献

1
Advancements in Radiology Report Generation: A Comprehensive Analysis.放射学报告生成的进展:全面分析
Bioengineering (Basel). 2025 Jun 25;12(7):693. doi: 10.3390/bioengineering12070693.
2
[CRAKUT:integrating contrastive regional attention and clinical prior knowledge in U-transformer for radiology report generation].[CRAKUT:在用于放射学报告生成的U型变压器中整合对比区域注意力和临床先验知识]
Nan Fang Yi Ke Da Xue Xue Bao. 2025 Jun 20;45(6):1343-1352. doi: 10.12122/j.issn.1673-4254.2025.06.24.
3
Advancements in Medical Radiology Through Multimodal Machine Learning: A Comprehensive Overview.

本文引用的文献

1
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.基于深度神经网络的自监督视觉特征学习:综述
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4037-4058. doi: 10.1109/TPAMI.2020.2992393. Epub 2021 Oct 1.
2
Neural attention with character embeddings for hay fever detection from twitter.用于从推特检测花粉热的带有字符嵌入的神经注意力机制
Health Inf Sci Syst. 2019 Oct 12;7(1):21. doi: 10.1007/s13755-019-0084-2. eCollection 2019 Dec.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
通过多模态机器学习实现医学放射学的进展:全面概述
Bioengineering (Basel). 2025 Apr 30;12(5):477. doi: 10.3390/bioengineering12050477.
4
A vision attention driven Language framework for medical report generation.一种用于医学报告生成的视觉注意力驱动语言框架。
Sci Rep. 2025 Mar 28;15(1):10704. doi: 10.1038/s41598-025-95666-8.
5
Multimodal generative AI for medical image interpretation.用于医学图像解读的多模态生成式人工智能。
Nature. 2025 Mar;639(8056):888-896. doi: 10.1038/s41586-025-08675-y. Epub 2025 Mar 26.
6
A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis.一种用于零样本临床诊断的多模态、多领域、多语言医学基础模型。
NPJ Digit Med. 2025 Feb 6;8(1):86. doi: 10.1038/s41746-024-01339-7.
7
Advancement in medical report generation: current practices, challenges, and future directions.医学报告生成的进展:当前实践、挑战及未来方向。
Med Biol Eng Comput. 2025 May;63(5):1249-1270. doi: 10.1007/s11517-024-03265-y. Epub 2024 Dec 21.
8
Multi-modal transformer architecture for medical image analysis and automated report generation.多模态转换器架构在医学图像分析和自动报告生成中的应用。
Sci Rep. 2024 Aug 20;14(1):19281. doi: 10.1038/s41598-024-69981-5.
9
CSAMDT: Conditional Self Attention Memory-Driven Transformers for Radiology Report Generation from Chest X-Ray.CSAMDT:用于从胸部X光生成放射学报告的条件自注意力记忆驱动变压器
J Imaging Inform Med. 2024 Dec;37(6):2825-2837. doi: 10.1007/s10278-024-01126-6. Epub 2024 Jun 3.
10
A scoping review on multimodal deep learning in biomedical images and texts.多模态深度学习在生物医学图像和文本中的应用综述
J Biomed Inform. 2023 Oct;146:104482. doi: 10.1016/j.jbi.2023.104482. Epub 2023 Aug 29.
BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning.基于自适应图学习的多特征融合行人再识别。
IEEE Trans Neural Netw Learn Syst. 2020 May;31(5):1592-1601. doi: 10.1109/TNNLS.2019.2920905. Epub 2019 Jul 3.
5
Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。
Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.
6
An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition.一种用于视频语义识别的自适应半监督特征分析。
IEEE Trans Cybern. 2018 Feb;48(2):648-660. doi: 10.1109/TCYB.2017.2647904. Epub 2017 Feb 20.
7
Adaptive Unsupervised Feature Selection With Structure Regularization.自适应无监督特征选择与结构正则化。
IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):944-956. doi: 10.1109/TNNLS.2017.2650978. Epub 2017 Jan 27.
8
Compound Rank- k Projections for Bilinear Analysis.双线性分析的复合秩-k 投影。
IEEE Trans Neural Netw Learn Syst. 2016 Jul;27(7):1502-13. doi: 10.1109/TNNLS.2015.2441735. Epub 2015 Jul 17.
9
Preparing a collection of radiology examinations for distribution and retrieval.准备一批用于分发和检索的放射学检查资料。
J Am Med Inform Assoc. 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1.
10
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.