DeepSeek辅助的LI-RADS分类：人工智能驱动的肝细胞癌诊断精准度

DeepSeek-assisted LI-RADS classification: AI-driven precision in hepatocellular carcinoma diagnosis.

作者信息

Zhang Jun, Liu Jinpeng, Guo Mingyang, Zhang Xin, Xiao Wenbo, Chen Feng

机构信息

Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, P.R. China.

Department of Radiology, Affiliated Xiaoshan Hospital, Hangzhou Normal University, Hangzhou, Zhejiang Province, P.R. China.

出版信息

Int J Surg. 2025 Sep 1;111(9):5970-5979. doi: 10.1097/JS9.0000000000002763. Epub 2025 Jun 20.

DOI:10.1097/JS9.0000000000002763

PMID:40552875

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12430892/

Abstract

BACKGROUND

The clinical utility of the DeepSeek-V3 (DSV3) model in enhancing the accuracy of Liver Imaging Reporting and Data System (LI-RADS, LR) classification remains underexplored. This study aimed to evaluate the diagnostic performance of DSV3 in LR classifications compared to radiologists with varying levels of experience and to assess its potential as a decision-support tool in clinical practice.

MATERIALS AND METHODS

A dual-phase retrospective-prospective study analyzed 426 liver lesions (300 retrospective, 126 prospective) in high-risk hepatocellular carcinoma (HCC) patients who underwent magnetic resonance imaging or computed tomography. Three radiologists (one junior, two seniors) independently classified lesions using LR v2018 criteria, while DSV3 analyzed unstructured radiology reports to generate corresponding classifications. In the prospective cohort, DSV3 processed inputs in both Chinese and English to evaluate language impact. Performance was compared using chi-square test or Fisher's exact test, with pathology as the gold standard.

RESULTS

In the retrospective cohort, DSV3 significantly outperformed junior radiologists in diagnostically challenging categories: LR-3 (17.8% vs. 39.7%, P < 0.05), LR-4 (80.4% vs. 46.2%, P < 0.05), and LR-5 (86.2% vs. 66.7%, P < 0.05), while showing comparable accuracy in LR-1 (90.8% vs. 88.7%), LR-2 (11.9% vs. 25.6%), and LR-M (79.5% vs. 62.1%) classifications (all P > 0.05). Prospective validation confirmed these findings, with DSV3 demonstrating superior performance for LR-3 (13.3% vs. 60.0%), LR-4 (93.3% vs. 66.7%), and LR-5 (93.5% vs. 67.7%) compared to junior radiologists (all P < 0.05). Notably, DSV3 achieved diagnostic parity with senior radiologists across all categories ( P > 0.05) and maintained consistent performance between Chinese and English inputs.

CONCLUSION

The DSV3 model effectively improves diagnostic accuracy of LR-3 to LR-5 classifications among junior radiologists. Its language-independent performance and ability to match senior-level expertise suggest strong potential for clinical implementation to standardize HCC diagnosis and optimize treatment decisions.

摘要

背景

DeepSeek-V3（DSV3）模型在提高肝脏影像报告和数据系统（LI-RADS，LR）分类准确性方面的临床效用尚未得到充分探索。本研究旨在评估DSV3在LR分类中的诊断性能，并与不同经验水平的放射科医生进行比较，同时评估其在临床实践中作为决策支持工具的潜力。

材料与方法

一项双阶段回顾性-前瞻性研究分析了426例接受磁共振成像或计算机断层扫描的高危肝细胞癌（HCC）患者的肝脏病变（300例回顾性，126例前瞻性）。三名放射科医生（一名初级，两名高级）使用LR v2018标准对病变进行独立分类，而DSV3分析非结构化放射学报告以生成相应分类。在前瞻性队列中，DSV3处理中文和英文输入以评估语言影响。以病理作为金标准，使用卡方检验或Fisher精确检验比较性能。

结果

在回顾性队列中，DSV3在诊断具有挑战性的类别中显著优于初级放射科医生：LR-3（17.8%对39.7%，P<0.05）、LR-4（80.4%对46.2%，P<0.05）和LR-5（86.2%对66.7%，P<0.05），而在LR-1（90.8%对88.7%）、LR-2（11.9%对25.6%）和LR-M（79.5%对62.1%）分类中显示出相当的准确性（所有P>0.05）。前瞻性验证证实了这些发现，与初级放射科医生相比，DSV3在LR-3（13.3%对60.0%）、LR-4（93.3%对66.7%）和LR-5（93.5%对67.7%）方面表现更优（所有P<0.05）。值得注意的是，DSV3在所有类别中与高级放射科医生的诊断水平相当（P>0.05），并且在中文和英文输入之间保持一致的性能。