• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.大语言模型ChatGPT-3.5、ChatGPT-4和Open AI-o1在骨髓瘤细胞程序性死亡领域的性能比较分析。
Discov Oncol. 2025 May 23;16(1):870. doi: 10.1007/s12672-025-02648-3.
2
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
3
Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响
Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.
4
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.
5
Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis.人工智能在回答口腔病理学选择题方面的表现:一项对比分析。
BMC Oral Health. 2025 Apr 15;25(1):573. doi: 10.1186/s12903-025-05926-2.
6
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
7
Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.评估大型语言模型在放射学中的参考准确性:一项跨亚专业的比较研究。
Diagn Interv Radiol. 2025 May 12. doi: 10.4274/dir.2025.253101.
8
Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks.ChatGPT-3.5在肿瘤学领域知识的评估:与美国临床肿瘤学会-欧洲肿瘤内科学会基准的比较研究
JMIR AI. 2024 Jan 12;3:e50442. doi: 10.2196/50442.
9
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
10
DeepSeek vs ChatGPT: a comparison study of their performance in answering prostate cancer radiotherapy questions in multiple languages.深度搜索与ChatGPT:它们在以多种语言回答前列腺癌放射治疗问题方面的性能比较研究。
Am J Clin Exp Urol. 2025 Apr 25;13(2):176-185. doi: 10.62347/UIAP7979. eCollection 2025.

本文引用的文献

1
AI-Driven Drug Discovery for Rare Diseases.用于罕见病的人工智能驱动的药物发现
J Chem Inf Model. 2025 Mar 10;65(5):2214-2231. doi: 10.1021/acs.jcim.4c01966. Epub 2024 Dec 17.
2
Augmenting small biomedical datasets using generative AI methods based on self-organizing neural networks.基于自组织神经网络的生成式人工智能方法增强小型生物医学数据集。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae640.
3
Revolutionizing healthcare: a comparative insight into deep learning's role in medical imaging.变革医疗保健:深入洞察深度学习在医学成像中的作用
Sci Rep. 2024 Dec 4;14(1):30273. doi: 10.1038/s41598-024-71358-7.
4
New horizons in our understanding of precursor multiple myeloma and early interception.在理解前驱性多发性骨髓瘤和早期干预方面的新进展。
Nat Rev Cancer. 2024 Dec;24(12):867-886. doi: 10.1038/s41568-024-00755-x. Epub 2024 Oct 16.
5
Genomic instability and genetic heterogeneity in aging: insights from clonal hematopoiesis (CHIP), monoclonal gammopathy (MGUS), and monoclonal B-cell lymphocytosis (MBL).衰老过程中的基因组不稳定性和遗传异质性:来自克隆性造血(CHIP)、单克隆丙种球蛋白病(MGUS)和单克隆B淋巴细胞增多症(MBL)的见解。
Geroscience. 2025 Feb;47(1):703-720. doi: 10.1007/s11357-024-01374-y. Epub 2024 Oct 15.
6
Defining precancer: a grand challenge for the cancer community.定义癌前病变:癌症领域的重大挑战。
Nat Rev Cancer. 2024 Nov;24(11):792-809. doi: 10.1038/s41568-024-00744-0. Epub 2024 Oct 1.
7
DrugFormer: Graph-Enhanced Language Model to Predict Drug Sensitivity.DrugFormer:基于图增强的语言模型预测药物敏感性。
Adv Sci (Weinh). 2024 Oct;11(40):e2405861. doi: 10.1002/advs.202405861. Epub 2024 Aug 29.
8
Multiple myeloma: clinical characteristics, current therapies and emerging innovative treatments targeting ribosome biogenesis dynamics.多发性骨髓瘤:临床特征、现有疗法和针对核糖体生物发生动力学的新兴创新治疗方法。
Clin Exp Metastasis. 2024 Dec;41(6):829-842. doi: 10.1007/s10585-024-10305-2. Epub 2024 Aug 20.
9
Cellular senescence and metabolic reprogramming: Unraveling the intricate crosstalk in the immunosuppressive tumor microenvironment.细胞衰老与代谢重编程:探索免疫抑制性肿瘤微环境中的复杂串扰。
Cancer Commun (Lond). 2024 Sep;44(9):929-966. doi: 10.1002/cac2.12591. Epub 2024 Jul 12.
10
The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications.金发姑娘范式:比较经典机器学习、大语言模型和少样本学习在药物发现应用中的表现
Commun Chem. 2024 Jun 12;7(1):134. doi: 10.1038/s42004-024-01220-4.

大语言模型ChatGPT-3.5、ChatGPT-4和Open AI-o1在骨髓瘤细胞程序性死亡领域的性能比较分析。

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.

作者信息

Kun Wu, Bo Tao, Yuntao Li, Shenju Cheng, Yanhong Li, Shan Luo, Yun Zeng, Bo Nie, Mingxia Shi

机构信息

Yunnan Key Laboratory of Laboratory Medicine, Yunnan Province Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The First Affiliated Hospital of Kunming Medical University, Kunming, 650032, China.

Information Center, The First Affiliated Hospital of Kunming Medical University, Kunming, 650032, China.

出版信息

Discov Oncol. 2025 May 23;16(1):870. doi: 10.1007/s12672-025-02648-3.

DOI:10.1007/s12672-025-02648-3
PMID:40407967
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12102019/
Abstract

UNLABELLED

ABS: OBJECTIVE: This study aimed to compare the performance of three large language models (LLMs)-ChatGPT-3.5, ChatGPT-4, and Open AI-o1-in addressing clinical questions related to Programmed Cell Death in multiple myeloma. By evaluating each model's accuracy, comprehensiveness, and self-correcting capabilities, the investigation sought to determine the most effective tool for supporting clinical decision-making in this specialized oncological context.

METHODS

A comprehensive set of forty clinical questions was curated from recent high-impact oncology journals, International Myeloma Working Group (IMWG) guidelines, and reputable medical databases, covering various aspects of Programmed Cell Death in multiple myeloma. These questions were refined and validated by a panel of four hematologists-oncologists with expertise in the field. Each question was individually posed to ChatGPT-3.5, ChatGPT-4, and Open AI-o1 in controlled sessions. Responses were anonymized and evaluated by the same panel using a five-point Likert scale assessing accuracy, depth, and completeness. Responses were categorized as "excellent," "satisfactory," or "insufficient" based on cumulative scores. Additionally, the models' self-correcting abilities were assessed by providing feedback on initially insufficient responses and re-evaluating the revised answers. Interrater reliability was measured using Cohen's Kappa coefficients.

RESULTS

Open AI-o1 consistently generated the most extensive and detailed responses, achieving significantly higher total scores across all domains compared to ChatGPT-3.5 and ChatGPT-4. It demonstrated the lowest proportion of "insufficient" responses and the highest percentage of "excellent" answers, particularly excelling in guideline-based questions. Open AI-o1 also exhibited superior self-correcting capacity, effectively enhancing its responses upon receiving feedback. The highest Cohen's Kappa coefficient among the models indicated greater consistency in evaluations by clinical experts. User satisfaction surveys revealed that 85% of hematologists-oncologists rated Open AI-o1 as "highly satisfactory," compared to 60% for ChatGPT-4 and 45% for ChatGPT-3.5.

CONCLUSION

Open AI-o1 outperforms ChatGPT-3.5 and ChatGPT-4 in accuracy, depth, and reliability when addressing complex clinical questions related to Programmed Cell Death in multiple myeloma. Its advanced "thinking" ability facilitates comprehensive and evidence-based responses, making it a more dependable tool for clinical decision support. These findings suggest that Open AI-o1 holds significant potential for enhancing clinical practices in specialized oncological fields, though ongoing validation and integration with human expertise remain essential.

摘要

未标注

摘要:目的:本研究旨在比较三种大语言模型(LLMs)——ChatGPT-3.5、ChatGPT-4和Open AI-o1——在解决与多发性骨髓瘤程序性细胞死亡相关的临床问题方面的表现。通过评估每个模型的准确性、全面性和自我纠正能力,该研究试图确定在这个专门的肿瘤学背景下支持临床决策的最有效工具。

方法

从近期高影响力的肿瘤学期刊、国际骨髓瘤工作组(IMWG)指南和著名医学数据库中精心挑选了一组全面的40个临床问题,涵盖多发性骨髓瘤程序性细胞死亡的各个方面。这些问题由四位在该领域具有专业知识的血液肿瘤学家组成的小组进行完善和验证。在受控环节中,将每个问题分别向ChatGPT-3.5、ChatGPT-4和Open AI-o1提出。回答进行匿名处理,并由同一小组使用五点李克特量表评估准确性、深度和完整性。根据累积分数将回答分为“优秀”、“满意”或“不足”。此外,通过对最初不足的回答提供反馈并重新评估修订后的答案来评估模型的自我纠正能力。使用科恩卡帕系数测量评分者间信度。

结果

Open AI-o1始终生成最广泛和详细的回答,与ChatGPT-3.5和ChatGPT-4相比,在所有领域的总分显著更高。它表现出“不足”回答的比例最低,“优秀”答案的百分比最高,在基于指南的问题上尤其出色。Open AI-o1还表现出卓越的自我纠正能力,在收到反馈后有效地改进了其回答。模型中科恩卡帕系数最高表明临床专家的评估一致性更高。用户满意度调查显示,85%的血液肿瘤学家将Open AI-o1评为“非常满意”,而ChatGPT-4为60%,ChatGPT-3.5为45%。

结论

在解决与多发性骨髓瘤程序性细胞死亡相关的复杂临床问题时,Open AI-o1在准确性、深度和可靠性方面优于ChatGPT-3.5和ChatGPT-4。其先进的“思维”能力有助于提供全面且基于证据的回答,使其成为更可靠的临床决策支持工具。这些发现表明,Open AI-o1在增强专门肿瘤学领域的临床实践方面具有巨大潜力,尽管持续的验证以及与人类专业知识的整合仍然至关重要。