• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型:数据集开发与验证研究

Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.

作者信息

Yamagishi Yosuke, Nakamura Yuta, Kikuchi Tomohiro, Sonoda Yuki, Hirakawa Hiroshi, Kano Shintaro, Nakamura Satoshi, Hanaoka Shouhei, Yoshikawa Takeharu, Abe Osamu

机构信息

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 3-3815-5411.

Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan.

出版信息

JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.

DOI:10.2196/71137
PMID:40874833
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12392688/
Abstract

BACKGROUND

Recent advances in large language models have highlighted the need for high-quality multilingual medical datasets. Although Japan is a global leader in computed tomography (CT) scanner deployment and use, the absence of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Despite the emergence of multilingual models and language-specific adaptations, the development of Japanese-specific medical language models has been constrained by a lack of comprehensive datasets, particularly in radiology.

OBJECTIVE

This study aims to address this critical gap in Japanese medical natural language processing resources, for which a comprehensive Japanese CT report dataset was developed through machine translation, to establish a specialized language model for structured classification. In addition, a rigorously validated evaluation dataset was created through expert radiologist refinement to ensure a reliable assessment of model performance.

METHODS

We translated the CT-RATE dataset (24,283 CT reports from 21,304 patients) into Japanese using GPT-4o mini. The training dataset consisted of 22,778 machine-translated reports, and the validation dataset included 150 reports carefully revised by radiologists. We developed CT-BERT-JPN, a specialized Bidirectional Encoder Representations from Transformers (BERT) model for Japanese radiology text, based on the "tohoku-nlp/bert-base-japanese-v3" architecture, to extract 18 structured findings from reports. Translation quality was assessed with Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores and further evaluated by radiologists in a dedicated human-in-the-loop experiment. In that experiment, each of a randomly selected subset of reports was independently reviewed by 2 radiologists-1 senior (postgraduate year [PGY] 6-11) and 1 junior (PGY 4-5)-using a 5-point Likert scale to rate: (1) grammatical correctness, (2) medical terminology accuracy, and (3) overall readability. Inter-rater reliability was measured via quadratic weighted kappa (QWK). Model performance was benchmarked against GPT-4o using accuracy, precision, recall, F1-score, ROC (receiver operating characteristic)-AUC (area under the curve), and average precision.

RESULTS

General text structure was preserved (BLEU: 0.731 findings, 0.690 impression; ROUGE: 0.770-0.876 findings, 0.748-0.857 impression), though expert review identified 3 categories of necessary refinements-contextual adjustment of technical terms, completion of incomplete translations, and localization of Japanese medical terminology. The radiologist-revised translations scored significantly higher than raw machine translations across all dimensions, and all improvements were statistically significant (P<.001). CT-BERT-JPN outperformed GPT-4o on 11 of 18 findings (61%), achieving perfect F1-scores for 4 conditions and F1-score >0.95 for 14 conditions, despite varied sample sizes (7-82 cases).

CONCLUSIONS

Our study established a robust Japanese CT report dataset and demonstrated the effectiveness of a specialized language model in structured classification of findings. This hybrid approach of machine translation and expert validation enabled the creation of large-scale datasets while maintaining high-quality standards. This study provides essential resources for advancing medical artificial intelligence research in Japanese health care settings, using datasets and models publicly available for research to facilitate further advancement in the field.

摘要

背景

大语言模型的最新进展凸显了对高质量多语言医学数据集的需求。尽管日本在计算机断层扫描(CT)扫描仪的部署和使用方面处于全球领先地位,但缺乏大规模的日本放射学数据集阻碍了用于医学影像分析的专业语言模型的开发。尽管出现了多语言模型和特定语言的改编,但特定于日语的医学语言模型的开发一直受到缺乏综合数据集的限制,尤其是在放射学领域。

目的

本研究旨在填补日本医学自然语言处理资源中的这一关键空白,为此通过机器翻译开发了一个综合的日语CT报告数据集,以建立一个用于结构化分类的专业语言模型。此外,通过放射科专家的完善创建了一个经过严格验证的评估数据集,以确保对模型性能进行可靠评估。

方法

我们使用GPT-4o mini将CT-RATE数据集(来自21304名患者的24283份CT报告)翻译成日语。训练数据集由22778份机器翻译报告组成,验证数据集包括150份经过放射科医生仔细修订的报告。我们基于“tohoku-nlp/bert-base-japanese-v3”架构开发了CT-BERT-JPN,这是一种专门用于日语放射学文本的双向编码器表征来自变换器(BERT)模型,用于从报告中提取18个结构化结果。使用双语评估辅助工具(BLEU)和面向召回率的辅助工具进行摘要评估(ROUGE)分数评估翻译质量,并在专门的人工参与实验中由放射科医生进一步评估。在该实验中,随机选择的报告子集中的每份报告由2名放射科医生独立评审——1名高级医生(研究生第6 - 11年)和1名初级医生(研究生第4 - 5年)——使用5点李克特量表进行评分:(1)语法正确性,(2)医学术语准确性,以及(3)整体可读性。通过二次加权卡帕(QWK)测量评分者间信度。使用准确率、精确率、召回率、F1分数、ROC(接收者操作特征)- AUC(曲线下面积)和平均精确率将模型性能与GPT-4o进行基准比较。

结果

尽管专家评审确定了3类必要的改进——技术术语的上下文调整、不完整翻译的完成以及日语医学术语的本地化,但一般文本结构得以保留(BLEU:结果0.731,印象0.690;ROUGE:结果0.770 - 0.876,印象0.748 - 0.857)。在所有维度上,经放射科医生修订的翻译得分显著高于原始机器翻译,且所有改进均具有统计学意义(P <.001)。尽管样本量不同(7 - 82例),CT-BERT-JPN在18个结果中的11个(61%)上优于GPT-4o,在4种情况下实现了完美的F1分数,在14种情况下F1分数>0.95。

结论

我们的研究建立了一个强大的日语CT报告数据集,并证明了专业语言模型在结果结构化分类中的有效性。这种机器翻译和专家验证的混合方法能够在保持高质量标准的同时创建大规模数据集。本研究为推进日本医疗保健环境中的医学人工智能研究提供了重要资源,使用公开可用的数据集和模型促进该领域的进一步发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/b251b56d29a2/medinform-v13-e71137-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/cce5ea112599/medinform-v13-e71137-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/a2791d0452b1/medinform-v13-e71137-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/1b2a1a841651/medinform-v13-e71137-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/f9a0254f8886/medinform-v13-e71137-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/2216ed2d14df/medinform-v13-e71137-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/b251b56d29a2/medinform-v13-e71137-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/cce5ea112599/medinform-v13-e71137-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/a2791d0452b1/medinform-v13-e71137-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/1b2a1a841651/medinform-v13-e71137-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/f9a0254f8886/medinform-v13-e71137-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/2216ed2d14df/medinform-v13-e71137-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/12392688/b251b56d29a2/medinform-v13-e71137-g006.jpg

相似文献

1
Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型:数据集开发与验证研究
JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.
2
Comparison of a Specialized Large Language Model with GPT-4o for CT and MRI Radiology Report Summarization.一种用于CT和MRI放射学报告总结的专业大语言模型与GPT-4o的比较。
Radiology. 2025 Aug;316(2):e243774. doi: 10.1148/radiol.243774.
3
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.在印度使用专门的大语言模型进行月经健康教育:MenstLLaMA的开发与评估研究
J Med Internet Res. 2025 Jul 16;27:e71977. doi: 10.2196/71977.
4
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
5
Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study.用于挪威临床文本代码预测的基于变压器的挪威语临床双向编码器表示的特定领域预训练:模型开发与评估研究
JMIR AI. 2025 Aug 25;4:e66153. doi: 10.2196/66153.
6
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
7
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
8
Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.用于提取肾活检病理诊断的自然语言处理模型的开发
Kidney Med. 2025 Jun 14;7(8):101047. doi: 10.1016/j.xkme.2025.101047. eCollection 2025 Aug.
9
Evaluation of GPT-4o for multilingual translation of radiology reports across imaging modalities.GPT-4o用于跨成像模态的放射学报告多语言翻译的评估。
Eur J Radiol. 2025 Oct;191:112341. doi: 10.1016/j.ejrad.2025.112341. Epub 2025 Jul 29.
10
Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration.裂隙灯报告生成与问答:集成大语言模型的多模态变压器模型的开发与验证
J Med Internet Res. 2024 Dec 30;26:e54047. doi: 10.2196/54047.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages.大型语言模型将CT和MRI自由文本放射学报告翻译成多种语言的能力。
Radiology. 2024 Dec;313(3):e241736. doi: 10.1148/radiol.241736.
3
Be aware of overfitting by hyperparameter optimization!通过超参数优化注意过拟合!
J Cheminform. 2024 Dec 9;16(1):139. doi: 10.1186/s13321-024-00934-w.
4
Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.基于有限标注的英文和日文病例/放射学报告的跨语言自然语言处理:来自Real-MedNLP研讨会的见解。
Methods Inf Med. 2024 Oct 29. doi: 10.1055/a-2405-2489.
5
Assessment of Follow-Up for Pulmonary Nodules from Radiology Reports with Natural Language Processing.基于自然语言处理的放射学报告中肺结节随访评估。
Stud Health Technol Inform. 2024 Aug 22;316:1795-1799. doi: 10.3233/SHTI240779.
6
Number of computed tomography scanners and regional disparities based on population and medical resources in Japan.日本的计算机断层扫描设备数量及其与人口和医疗资源的地区差异。
Radiol Phys Technol. 2023 Sep;16(3):355-365. doi: 10.1007/s12194-023-00725-2. Epub 2023 May 18.
7
BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports.基于BERT的自由文本放射学报告句子级解剖分类迁移学习
Radiol Artif Intell. 2023 Feb 15;5(2):e220097. doi: 10.1148/ryai.220097. eCollection 2023 Mar.
8
RadBERT: Adapting Transformer-based Language Models to Radiology.RadBERT:使基于Transformer的语言模型适用于放射学领域。
Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.
9
End-to-End Approach for Structuring Radiology Reports.构建放射学报告的端到端方法。
Stud Health Technol Inform. 2020 Jun 16;270:203-207. doi: 10.3233/SHTI200151.
10
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR,一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。
Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.