Li Qinpeng, Zhan Lili, Cai Xinjian
Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, Guangdong, People's Republic of China.
J Multidiscip Healthc. 2025 Aug 12;18:4979-4988. doi: 10.2147/JMDH.S538253. eCollection 2025.
Recent advancements in artificial intelligence (AI), particularly with large language models (LLMs), are transforming healthcare by enhancing diagnostic decision-making and clinical workflows. The application of LLMs like DeepSeek-R1 in clinical laboratory medicine demonstrates potential for improving diagnostic accuracy, supporting decision-making, and optimizing patient care.
This study evaluates the performance of DeepSeek-R1 in analyzing clinical laboratory cases and assisting with medical decision-making. The focus is on assessing its accuracy and completeness in generating diagnostic hypotheses, differential diagnoses, and diagnostic workups across diverse clinical cases.
We analyzed 100 clinical cases from , which includes comprehensive case histories and laboratory findings. DeepSeek-R1 was queried independently for each case three times, with three specific questions regarding diagnosis, differential diagnoses, and diagnostic tests. The outputs were assessed for accuracy and completeness by senior clinical laboratory physicians.
DeepSeek-R1 achieved an overall accuracy of 72.9% (95% CI [69.9%, 75.7%]) and completeness of 73.4% (95% CI [70.5%, 76.2%]). Performance varied by question type: the highest accuracy was observed for diagnostic hypotheses (85.7%, 95% CI [81.2%, 89.2%]) and the lowest for differential diagnoses (55.0%, 95% CI [49.3%, 60.5%]). Notable variations in performance were also seen across disease categories, with the best performance observed in genetic and obstetric diagnostics (accuracy 93.1%, 95% CI [84.0%, 97.3%]; completeness 86.1%, 95% CI [76.4%, 92.3%]).
DeepSeek-R1 demonstrates potential for a decision-support tool in clinical laboratory medicine, particularly in generating diagnostic hypotheses and recommending diagnostic workups. However, its performance in differential diagnosis and handling specific clinical nuances remains limited. Future work should focus on expanding training data, integrating clinical ontologies, and incorporating physician feedback to improve real-world applicability. DeepSeek-R1 and the new versions under development may be promising tools for non-medical professionals and professionals in medical laboratory diagnoses.
人工智能(AI)的最新进展,特别是大语言模型(LLMs),正在通过加强诊断决策和临床工作流程来改变医疗保健。像DeepSeek-R1这样的大语言模型在临床检验医学中的应用显示出提高诊断准确性、支持决策制定和优化患者护理的潜力。
本研究评估DeepSeek-R1在分析临床检验病例和协助医疗决策方面的性能。重点是评估其在生成诊断假设、鉴别诊断和针对不同临床病例的诊断检查方面的准确性和完整性。
我们分析了来自[具体来源未给出]的100个临床病例,其中包括全面的病史和实验室检查结果。针对每个病例,独立向DeepSeek-R1提出三个关于诊断、鉴别诊断和诊断检查的特定问题,共询问三次。由资深临床检验医师评估输出结果的准确性和完整性。
DeepSeek-R1的总体准确率为72.9%(95%置信区间[69.9%,75.7%]),完整性为73.4%(95%置信区间[70.5%,76.2%])。性能因问题类型而异:诊断假设的准确率最高(85.7%,95%置信区间[81.2%,89.2%]),鉴别诊断的准确率最低(55.0%,95%置信区间[49.3%,60.5%])。在不同疾病类别中也观察到性能的显著差异,在遗传和产科诊断中表现最佳(准确率93.1%,95%置信区间[84.0%,97.3%];完整性86.1%,95%置信区间[76.4%,92.3%])。
DeepSeek-R1在临床检验医学中显示出作为决策支持工具的潜力,特别是在生成诊断假设和推荐诊断检查方面。然而,其在鉴别诊断和处理特定临床细微差别方面的性能仍然有限。未来的工作应集中在扩大训练数据、整合临床本体以及纳入医生反馈以提高实际适用性。DeepSeek-R1和正在开发的新版本可能是面向非医学专业人员和医学检验诊断专业人员的有前景的工具。