Zhang Jun, Liu Jinpeng, Guo Mingyang, Zhang Xin, Xiao Wenbo, Chen Feng
Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, P.R. China.
Department of Radiology, Affiliated Xiaoshan Hospital, Hangzhou Normal University, Hangzhou, Zhejiang Province, P.R. China.
Int J Surg. 2025 Sep 1;111(9):5970-5979. doi: 10.1097/JS9.0000000000002763. Epub 2025 Jun 20.
The clinical utility of the DeepSeek-V3 (DSV3) model in enhancing the accuracy of Liver Imaging Reporting and Data System (LI-RADS, LR) classification remains underexplored. This study aimed to evaluate the diagnostic performance of DSV3 in LR classifications compared to radiologists with varying levels of experience and to assess its potential as a decision-support tool in clinical practice.
A dual-phase retrospective-prospective study analyzed 426 liver lesions (300 retrospective, 126 prospective) in high-risk hepatocellular carcinoma (HCC) patients who underwent magnetic resonance imaging or computed tomography. Three radiologists (one junior, two seniors) independently classified lesions using LR v2018 criteria, while DSV3 analyzed unstructured radiology reports to generate corresponding classifications. In the prospective cohort, DSV3 processed inputs in both Chinese and English to evaluate language impact. Performance was compared using chi-square test or Fisher's exact test, with pathology as the gold standard.
In the retrospective cohort, DSV3 significantly outperformed junior radiologists in diagnostically challenging categories: LR-3 (17.8% vs. 39.7%, P < 0.05), LR-4 (80.4% vs. 46.2%, P < 0.05), and LR-5 (86.2% vs. 66.7%, P < 0.05), while showing comparable accuracy in LR-1 (90.8% vs. 88.7%), LR-2 (11.9% vs. 25.6%), and LR-M (79.5% vs. 62.1%) classifications (all P > 0.05). Prospective validation confirmed these findings, with DSV3 demonstrating superior performance for LR-3 (13.3% vs. 60.0%), LR-4 (93.3% vs. 66.7%), and LR-5 (93.5% vs. 67.7%) compared to junior radiologists (all P < 0.05). Notably, DSV3 achieved diagnostic parity with senior radiologists across all categories ( P > 0.05) and maintained consistent performance between Chinese and English inputs.
The DSV3 model effectively improves diagnostic accuracy of LR-3 to LR-5 classifications among junior radiologists. Its language-independent performance and ability to match senior-level expertise suggest strong potential for clinical implementation to standardize HCC diagnosis and optimize treatment decisions.
DeepSeek-V3(DSV3)模型在提高肝脏影像报告和数据系统(LI-RADS,LR)分类准确性方面的临床效用尚未得到充分探索。本研究旨在评估DSV3在LR分类中的诊断性能,并与不同经验水平的放射科医生进行比较,同时评估其在临床实践中作为决策支持工具的潜力。
一项双阶段回顾性-前瞻性研究分析了426例接受磁共振成像或计算机断层扫描的高危肝细胞癌(HCC)患者的肝脏病变(300例回顾性,126例前瞻性)。三名放射科医生(一名初级,两名高级)使用LR v2018标准对病变进行独立分类,而DSV3分析非结构化放射学报告以生成相应分类。在前瞻性队列中,DSV3处理中文和英文输入以评估语言影响。以病理作为金标准,使用卡方检验或Fisher精确检验比较性能。
在回顾性队列中,DSV3在诊断具有挑战性的类别中显著优于初级放射科医生:LR-3(17.8%对39.7%,P<0.05)、LR-4(80.4%对46.2%,P<0.05)和LR-5(86.2%对66.7%,P<0.05),而在LR-1(90.8%对88.7%)、LR-2(11.9%对25.6%)和LR-M(79.5%对62.1%)分类中显示出相当的准确性(所有P>0.05)。前瞻性验证证实了这些发现,与初级放射科医生相比,DSV3在LR-3(13.3%对60.0%)、LR-4(93.3%对66.7%)和LR-5(93.5%对67.7%)方面表现更优(所有P<0.05)。值得注意 的是,DSV3在所有类别中与高级放射科医生的诊断水平相当(P>0.05),并且在中文和英文输入之间保持一致的性能。
DSV3模型有效提高了初级放射科医生在LR-3至LR-5分类中的诊断准确性。其与语言无关的性能以及与高级专业水平相匹配的能力表明,在临床应用中具有很强的潜力,可用于标准化HCC诊断并优化治疗决策。