文献检索，用中文搜 PubMed

UNLABELLED

ABS: OBJECTIVE: This study aimed to compare the performance of three large language models (LLMs)-ChatGPT-3.5, ChatGPT-4, and Open AI-o1-in addressing clinical questions related to Programmed Cell Death in multiple myeloma. By evaluating each model's accuracy, comprehensiveness, and self-correcting capabilities, the investigation sought to determine the most effective tool for supporting clinical decision-making in this specialized oncological context.

METHODS

A comprehensive set of forty clinical questions was curated from recent high-impact oncology journals, International Myeloma Working Group (IMWG) guidelines, and reputable medical databases, covering various aspects of Programmed Cell Death in multiple myeloma. These questions were refined and validated by a panel of four hematologists-oncologists with expertise in the field. Each question was individually posed to ChatGPT-3.5, ChatGPT-4, and Open AI-o1 in controlled sessions. Responses were anonymized and evaluated by the same panel using a five-point Likert scale assessing accuracy, depth, and completeness. Responses were categorized as "excellent," "satisfactory," or "insufficient" based on cumulative scores. Additionally, the models' self-correcting abilities were assessed by providing feedback on initially insufficient responses and re-evaluating the revised answers. Interrater reliability was measured using Cohen's Kappa coefficients.

RESULTS

Open AI-o1 consistently generated the most extensive and detailed responses, achieving significantly higher total scores across all domains compared to ChatGPT-3.5 and ChatGPT-4. It demonstrated the lowest proportion of "insufficient" responses and the highest percentage of "excellent" answers, particularly excelling in guideline-based questions. Open AI-o1 also exhibited superior self-correcting capacity, effectively enhancing its responses upon receiving feedback. The highest Cohen's Kappa coefficient among the models indicated greater consistency in evaluations by clinical experts. User satisfaction surveys revealed that 85% of hematologists-oncologists rated Open AI-o1 as "highly satisfactory," compared to 60% for ChatGPT-4 and 45% for ChatGPT-3.5.

CONCLUSION

Open AI-o1 outperforms ChatGPT-3.5 and ChatGPT-4 in accuracy, depth, and reliability when addressing complex clinical questions related to Programmed Cell Death in multiple myeloma. Its advanced "thinking" ability facilitates comprehensive and evidence-based responses, making it a more dependable tool for clinical decision support. These findings suggest that Open AI-o1 holds significant potential for enhancing clinical practices in specialized oncological fields, though ongoing validation and integration with human expertise remain essential.

UNLABELLED

METHODS

RESULTS

CONCLUSION

未标注

摘要：目的：本研究旨在比较三种大语言模型（LLMs）——ChatGPT-3.5、ChatGPT-4和Open AI-o1——在解决与多发性骨髓瘤程序性细胞死亡相关的临床问题方面的表现。通过评估每个模型的准确性、全面性和自我纠正能力，该研究试图确定在这个专门的肿瘤学背景下支持临床决策的最有效工具。

方法

从近期高影响力的肿瘤学期刊、国际骨髓瘤工作组（IMWG）指南和著名医学数据库中精心挑选了一组全面的40个临床问题，涵盖多发性骨髓瘤程序性细胞死亡的各个方面。这些问题由四位在该领域具有专业知识的血液肿瘤学家组成的小组进行完善和验证。在受控环节中，将每个问题分别向ChatGPT-3.5、ChatGPT-4和Open AI-o1提出。回答进行匿名处理，并由同一小组使用五点李克特量表评估准确性、深度和完整性。根据累积分数将回答分为“优秀”、“满意”或“不足”。此外，通过对最初不足的回答提供反馈并重新评估修订后的答案来评估模型的自我纠正能力。使用科恩卡帕系数测量评分者间信度。

结果

Open AI-o1始终生成最广泛和详细的回答，与ChatGPT-3.5和ChatGPT-4相比，在所有领域的总分显著更高。它表现出“不足”回答的比例最低，“优秀”答案的百分比最高，在基于指南的问题上尤其出色。Open AI-o1还表现出卓越的自我纠正能力，在收到反馈后有效地改进了其回答。模型中科恩卡帕系数最高表明临床专家的评估一致性更高。用户满意度调查显示，85%的血液肿瘤学家将Open AI-o1评为“非常满意”，而ChatGPT-4为60%，ChatGPT-3.5为45%。

结论

在解决与多发性骨髓瘤程序性细胞死亡相关的复杂临床问题时，Open AI-o1在准确性、深度和可靠性方面优于ChatGPT-3.5和ChatGPT-4。其先进的“思维”能力有助于提供全面且基于证据的回答，使其成为更可靠的临床决策支持工具。这些发现表明，Open AI-o1在增强专门肿瘤学领域的临床实践方面具有巨大潜力，尽管持续的验证以及与人类专业知识的整合仍然至关重要。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大语言模型ChatGPT-3.5、ChatGPT-4和Open AI-o1在骨髓瘤细胞程序性死亡领域的性能比较分析。

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.

作者信息

机构信息

出版信息

UNLABELLED

METHODS

RESULTS

CONCLUSION

相似文献

本文引用的文献

相似文献

本文引用的文献

大语言模型ChatGPT-3.5、ChatGPT-4和Open AI-o1在骨髓瘤细胞程序性死亡领域的性能比较分析。

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.

作者信息

机构信息

出版信息

UNLABELLED

METHODS

RESULTS

CONCLUSION

未标注

方法

结果

结论