• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在妇科肿瘤决策支持中的评估

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.

作者信息

Gumilar Khanisyah Erza, Indraprasta Birama R, Faridzi Ach Salman, Wibowo Bagus M, Herlambang Aditya, Rahestyningtyas Eccita, Irawan Budi, Tambunan Zulkarnain, Bustomi Ahmad Fadhli, Brahmantara Bagus Ngurah, Yu Zih-Ying, Hsu Yu-Cheng, Pramuditya Herlangga, Putra Very Great E, Nugroho Hari, Mulawardhana Pungky, Tjokroprawiro Brahmana A, Hedianto Tri, Ibrahim Ibrahim H, Huang Jingshan, Li Dongqi, Lu Chien-Hsing, Yang Jer-Yen, Liao Li-Na, Tan Ming

机构信息

Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.

Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia.

出版信息

Comput Struct Biotechnol J. 2024 Oct 31;23:4019-4026. doi: 10.1016/j.csbj.2024.10.050. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.10.050
PMID:39610903
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11603009/
Abstract

OBJECTIVE

This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases.

BACKGROUND

LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework.

STUDY DESIGN

We assessed three prominent LLMs-ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot-evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence.

RESULTS

GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making.

CONCLUSION

LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.

摘要

目的

本研究通过关注大语言模型(LLMs)在复杂妇科癌症病例中的表现,调查其提供准确和一致答案的能力。

背景

大语言模型正在迅速发展,需要进行全面评估,以确保它们能够安全有效地用于临床决策。此类评估对于确认大语言模型在支持医疗专业人员处理病例方面的可靠性和准确性至关重要。

研究设计

我们评估了三个著名的大语言模型——ChatGPT-4(CG-4)、Gemini Advanced(GemAdv)和Copilot——评估它们的准确性、一致性和整体表现。使用了15个不同难度的临床病例摘要以及基于真实患者病例的5个开放式问题。六位妇科肿瘤专家使用5点李克特量表对回答进行编码、随机化并进行盲评,评估其相关性、清晰度、深度、重点和连贯性。

结果

在所有难度级别上,GemAdv的准确率(81.87%)均高于CG-4(61.60%)和Copilot(70.67%)。GemAdv始终更频繁地提供正确答案(测试期间每天>60%)。尽管CG-4在遵循美国国立综合癌症网络(NCCN)治疗指南方面略有优势,但GemAdv在所提供答案的深度和重点方面表现出色,而这是临床决策的关键方面。

结论

大语言模型,尤其是GemAdv,通过为妇科癌症提供准确、一致和相关的信息,在支持临床实践方面显示出潜力。然而,对于更复杂的情况还需要进一步完善。本研究突出了大语言模型在妇科肿瘤学中的前景,强调需要持续发展和严格评估,以最大限度地提高其临床效用和可靠性。

相似文献

1
Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.大语言模型在妇科肿瘤决策支持中的评估
Comput Struct Biotechnol J. 2024 Oct 31;23:4019-4026. doi: 10.1016/j.csbj.2024.10.050. eCollection 2024 Dec.
2
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
3
Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。
Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.
4
Evaluating the reliability of the responses of large language models to keratoconus-related questions.评估大语言模型对圆锥角膜相关问题回答的可靠性。
Clin Exp Optom. 2024 Oct 24:1-8. doi: 10.1080/08164622.2024.2419524.
5
Exploring the role of artificial intelligence, large language models: Comparing patient-focused information and clinical decision support capabilities to the gynecologic oncology guidelines.探索人工智能、大语言模型的作用:将以患者为中心的信息和临床决策支持能力与妇科肿瘤学指南进行比较。
Int J Gynaecol Obstet. 2025 Feb;168(2):419-427. doi: 10.1002/ijgo.15869. Epub 2024 Aug 20.
6
Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?人工智能语言模型能否有效解决牙齿创伤问题?
Dent Traumatol. 2025 Apr 1. doi: 10.1111/edt.13063.
7
Large language models in periodontology: Assessing their performance in clinically relevant questions.牙周病学中的大语言模型:评估它们在临床相关问题中的表现。
J Prosthet Dent. 2024 Nov 18. doi: 10.1016/j.prosdent.2024.10.020.
8
Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy : The perspective of the European Spine Study Group.大语言模型提供的信息在对患者进行青少年特发性脊柱侧凸教育方面是否有效?内容、清晰度和同理心的评估:欧洲脊柱研究小组的观点
Spine Deform. 2025 Mar;13(2):361-372. doi: 10.1007/s43390-024-00955-3. Epub 2024 Nov 4.
9
Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy.基于患者和临床医生的大语言模型在前列腺癌放疗患者教育中的评估
Strahlenther Onkol. 2025 Mar;201(3):333-342. doi: 10.1007/s00066-024-02342-3. Epub 2025 Jan 10.
10
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

引用本文的文献

1
Comparative analysis of ChatGPT 3.5 and ChatGPT 4 obstetric and gynecological knowledge.ChatGPT 3.5与ChatGPT 4妇产科知识的对比分析
Sci Rep. 2025 Jul 1;15(1):21133. doi: 10.1038/s41598-025-08424-1.
2
Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。
Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.
3
Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts.
学生和专家对ChatGPT在儿科听力学方面表现的评价
J Clin Med. 2025 Jan 28;14(3):875. doi: 10.3390/jcm14030875.
4
Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study.在骨科诊断中评估ChatGPT、Gemini和其他大语言模型:一项前瞻性临床研究。
Comput Struct Biotechnol J. 2024 Dec 26;28:9-15. doi: 10.1016/j.csbj.2024.12.013. eCollection 2025.