• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。

Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.

作者信息

Gumilar Khanisyah Erza, Wardhana Manggala Pasca, Akbar Muhammad Ilham Aldika, Putra Agung Sunarko, Banjarnahor Dharma Putra Perjuangan, Mulyana Ryan Saktika, Fatati Ita, Yu Zih-Ying, Hsu Yu-Cheng, Dachlan Erry Gumilar, Lu Chien-Hsing, Liao Li-Na, Tan Ming

机构信息

Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.

Department of Obstetrics and Gynecology, Universitas Airlangga Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia.

出版信息

Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.

DOI:10.1016/j.csbj.2025.03.026
PMID:40206348
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11981782/
Abstract

BACKGROUND

Accurate cardiotocography (CTG) interpretation is vital for the monitoring of fetal well-being during pregnancy and labor. Advanced artificial intelligence (AI) tools such as AI-large language models (AI-LLMs) may enhance the accuracy of CTG interpretation, but their potential has not been extensively evaluated.

OBJECTIVE

This study aimed to assess the performance of three AI-LLMs (ChatGPT-4o, Gemini Advanced, and Copilot) in CTG image interpretation, compare their results to those of junior (JHDs) and senior human doctors (SHDs), and evaluate their reliability in clinical decision-making.

STUDY DESIGN

Seven CTG images were interpreted by the three AI-LLMs, five SHDs, and five JHDs, with the evaluations scored by five blinded maternal-fetal medicine experts using a Likert scale for five parameters (relevance, clarity, depth, focus, and coherence). The homogeneity of the expert ratings and group performances were statistically compared.

RESULTS

ChatGPT-4o scored 77.86, outperforming the Gemini Advanced (57.14), Copilot (47.29), and JHDs (61.57). Its performance closely approached that of the SHDs (80.43), with no statistically significant difference between the two (p > 0.05). ChatGPT-4o excelled in the depth parameter and was only marginally inferior to the SHDs regarding the other parameters.

CONCLUSION

ChatGPT-4o demonstrated superior performance among the AI-LLMs, surpassed JHDs in CTG interpretation, and closely matched the performance level of SHDs. AI-LLMs, particularly ChatGPT-4o, are promising tools for assisting obstetricians, improving diagnostic accuracy, and enhancing obstetric patient care.

摘要

背景

准确解读胎心监护(CTG)对于孕期和分娩期间监测胎儿健康至关重要。先进的人工智能(AI)工具,如人工智能大语言模型(AI-LLMs),可能会提高CTG解读的准确性,但其潜力尚未得到广泛评估。

目的

本研究旨在评估三种AI-LLMs(ChatGPT-4o、Gemini Advanced和Copilot)在CTG图像解读中的表现,将其结果与初级(JHDs)和高级人类医生(SHDs)的结果进行比较,并评估它们在临床决策中的可靠性。

研究设计

由三种AI-LLMs、五名SHDs和五名JHDs对七张CTG图像进行解读,由五名不知情的母胎医学专家使用李克特量表对五个参数(相关性、清晰度、深度、重点和连贯性)进行评分。对专家评分和组间表现的同质性进行统计学比较。

结果

ChatGPT-4o得分为77.86,优于Gemini Advanced(57.14)、Copilot(47.29)和JHDs(61.57)。其表现与SHDs(80.43)相近,两者之间无统计学显著差异(p > 0.05)。ChatGPT-4o在深度参数方面表现出色,在其他参数方面仅略逊于SHDs。

结论

ChatGPT-4o在AI-LLMs中表现卓越,在CTG解读方面超过了JHDs,与SHDs的表现水平相近。AI-LLMs,尤其是ChatGPT-4o,是协助产科医生、提高诊断准确性和加强产科患者护理的有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/a51f2cbfca8d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/1b0b764bc843/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/7cbc9abf3fbb/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/7f6cafa922e2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/8c2a3257c9ed/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/a51f2cbfca8d/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/1b0b764bc843/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/7cbc9abf3fbb/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/7f6cafa922e2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/8c2a3257c9ed/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/a51f2cbfca8d/gr4.jpg

相似文献

1
Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。
Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.
2
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
3
Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析
Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.
4
Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.ChatGPT-4o、谷歌Gemini 1.5 Pro、微软Copilot Pro与眼科医生在葡萄膜炎和眼部炎症管理中的比较:大型语言模型的对比研究
J Fr Ophtalmol. 2025 Apr;48(4):104468. doi: 10.1016/j.jfo.2025.104468. Epub 2025 Mar 13.
5
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
6
Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。
BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.
7
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
8
Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.评估六种大语言模型在儿童牙科领域基于证据的潜力:生成式人工智能的比较研究
Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22.
9
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
10
Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?人工智能语言模型能否有效解决牙齿创伤问题?
Dent Traumatol. 2025 Apr 1. doi: 10.1111/edt.13063.

引用本文的文献

1
A comparative evaluation of publicly available large language models in the assessment of CTG traces according to the FIGO criteria.根据国际妇产科联盟(FIGO)标准,对公开可用的大语言模型在评估产时胎心监护(CTG)轨迹方面进行的比较评估。
Arch Gynecol Obstet. 2025 Aug 21. doi: 10.1007/s00404-025-08145-w.
2
Comparative analysis of ChatGPT 3.5 and ChatGPT 4 obstetric and gynecological knowledge.ChatGPT 3.5与ChatGPT 4妇产科知识的对比分析
Sci Rep. 2025 Jul 1;15(1):21133. doi: 10.1038/s41598-025-08424-1.

本文引用的文献

1
Comparison and verification of detection accuracy for late deceleration with and without uterine contractions signals using convolutional neural networks.使用卷积神经网络对有无子宫收缩信号时晚期减速检测准确性的比较与验证。
Front Physiol. 2025 Jan 23;16:1525266. doi: 10.3389/fphys.2025.1525266. eCollection 2025.
2
Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.大语言模型在妇科肿瘤决策支持中的评估
Comput Struct Biotechnol J. 2024 Oct 31;23:4019-4026. doi: 10.1016/j.csbj.2024.10.050. eCollection 2024 Dec.
3
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.
在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
4
Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report.用于卫生技术评估的生成式人工智能:机遇、挑战及政策考量:一份ISPOR工作组报告
Value Health. 2025 Feb;28(2):175-183. doi: 10.1016/j.jval.2024.10.3846. Epub 2024 Nov 12.
5
Artificial intelligence-driven predictive framework for early detection of still birth.用于死胎早期检测的人工智能驱动预测框架
SLAS Technol. 2024 Dec;29(6):100203. doi: 10.1016/j.slast.2024.100203. Epub 2024 Oct 17.
6
A Pragmatic Approach to Fetal Monitoring via Cardiotocography Using Feature Elimination and Hyperparameter Optimization.基于特征消除和超参数优化的胎心监护实用方法。
Interdiscip Sci. 2024 Dec;16(4):882-906. doi: 10.1007/s12539-024-00647-6. Epub 2024 Oct 5.
7
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
8
Comparative Evaluation of LLMs in Clinical Oncology.临床肿瘤学中大型语言模型的比较评估
NEJM AI. 2024 May;1(5). doi: 10.1056/aioa2300151. Epub 2024 Apr 16.
9
Disparities in medical recommendations from AI-based chatbots across different countries/regions.不同国家/地区的人工智能聊天机器人提供的医疗建议存在差异。
Sci Rep. 2024 Jul 24;14(1):17052. doi: 10.1038/s41598-024-67689-0.
10
Evaluation of large language models as a diagnostic aid for complex medical cases.评估大型语言模型作为复杂医疗病例诊断辅助工具的作用。
Front Med (Lausanne). 2024 Jun 20;11:1380148. doi: 10.3389/fmed.2024.1380148. eCollection 2024.