Suppr超能文献

用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。

Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.

作者信息

Gumilar Khanisyah Erza, Wardhana Manggala Pasca, Akbar Muhammad Ilham Aldika, Putra Agung Sunarko, Banjarnahor Dharma Putra Perjuangan, Mulyana Ryan Saktika, Fatati Ita, Yu Zih-Ying, Hsu Yu-Cheng, Dachlan Erry Gumilar, Lu Chien-Hsing, Liao Li-Na, Tan Ming

机构信息

Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.

Department of Obstetrics and Gynecology, Universitas Airlangga Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia.

出版信息

Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.

Abstract

BACKGROUND

Accurate cardiotocography (CTG) interpretation is vital for the monitoring of fetal well-being during pregnancy and labor. Advanced artificial intelligence (AI) tools such as AI-large language models (AI-LLMs) may enhance the accuracy of CTG interpretation, but their potential has not been extensively evaluated.

OBJECTIVE

This study aimed to assess the performance of three AI-LLMs (ChatGPT-4o, Gemini Advanced, and Copilot) in CTG image interpretation, compare their results to those of junior (JHDs) and senior human doctors (SHDs), and evaluate their reliability in clinical decision-making.

STUDY DESIGN

Seven CTG images were interpreted by the three AI-LLMs, five SHDs, and five JHDs, with the evaluations scored by five blinded maternal-fetal medicine experts using a Likert scale for five parameters (relevance, clarity, depth, focus, and coherence). The homogeneity of the expert ratings and group performances were statistically compared.

RESULTS

ChatGPT-4o scored 77.86, outperforming the Gemini Advanced (57.14), Copilot (47.29), and JHDs (61.57). Its performance closely approached that of the SHDs (80.43), with no statistically significant difference between the two (p > 0.05). ChatGPT-4o excelled in the depth parameter and was only marginally inferior to the SHDs regarding the other parameters.

CONCLUSION

ChatGPT-4o demonstrated superior performance among the AI-LLMs, surpassed JHDs in CTG interpretation, and closely matched the performance level of SHDs. AI-LLMs, particularly ChatGPT-4o, are promising tools for assisting obstetricians, improving diagnostic accuracy, and enhancing obstetric patient care.

摘要

背景

准确解读胎心监护(CTG)对于孕期和分娩期间监测胎儿健康至关重要。先进的人工智能(AI)工具,如人工智能大语言模型(AI-LLMs),可能会提高CTG解读的准确性,但其潜力尚未得到广泛评估。

目的

本研究旨在评估三种AI-LLMs(ChatGPT-4o、Gemini Advanced和Copilot)在CTG图像解读中的表现,将其结果与初级(JHDs)和高级人类医生(SHDs)的结果进行比较,并评估它们在临床决策中的可靠性。

研究设计

由三种AI-LLMs、五名SHDs和五名JHDs对七张CTG图像进行解读,由五名不知情的母胎医学专家使用李克特量表对五个参数(相关性、清晰度、深度、重点和连贯性)进行评分。对专家评分和组间表现的同质性进行统计学比较。

结果

ChatGPT-4o得分为77.86,优于Gemini Advanced(57.14)、Copilot(47.29)和JHDs(61.57)。其表现与SHDs(80.43)相近,两者之间无统计学显著差异(p > 0.05)。ChatGPT-4o在深度参数方面表现出色,在其他参数方面仅略逊于SHDs。

结论

ChatGPT-4o在AI-LLMs中表现卓越,在CTG解读方面超过了JHDs,与SHDs的表现水平相近。AI-LLMs,尤其是ChatGPT-4o,是协助产科医生、提高诊断准确性和加强产科患者护理的有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597a/11981782/1b0b764bc843/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验