Gumilar Khanisyah Erza, Wardhana Manggala Pasca, Akbar Muhammad Ilham Aldika, Putra Agung Sunarko, Banjarnahor Dharma Putra Perjuangan, Mulyana Ryan Saktika, Fatati Ita, Yu Zih-Ying, Hsu Yu-Cheng, Dachlan Erry Gumilar, Lu Chien-Hsing, Liao Li-Na, Tan Ming
Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.
Department of Obstetrics and Gynecology, Universitas Airlangga Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia.
Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.
Accurate cardiotocography (CTG) interpretation is vital for the monitoring of fetal well-being during pregnancy and labor. Advanced artificial intelligence (AI) tools such as AI-large language models (AI-LLMs) may enhance the accuracy of CTG interpretation, but their potential has not been extensively evaluated.
This study aimed to assess the performance of three AI-LLMs (ChatGPT-4o, Gemini Advanced, and Copilot) in CTG image interpretation, compare their results to those of junior (JHDs) and senior human doctors (SHDs), and evaluate their reliability in clinical decision-making.
Seven CTG images were interpreted by the three AI-LLMs, five SHDs, and five JHDs, with the evaluations scored by five blinded maternal-fetal medicine experts using a Likert scale for five parameters (relevance, clarity, depth, focus, and coherence). The homogeneity of the expert ratings and group performances were statistically compared.
ChatGPT-4o scored 77.86, outperforming the Gemini Advanced (57.14), Copilot (47.29), and JHDs (61.57). Its performance closely approached that of the SHDs (80.43), with no statistically significant difference between the two (p > 0.05). ChatGPT-4o excelled in the depth parameter and was only marginally inferior to the SHDs regarding the other parameters.
ChatGPT-4o demonstrated superior performance among the AI-LLMs, surpassed JHDs in CTG interpretation, and closely matched the performance level of SHDs. AI-LLMs, particularly ChatGPT-4o, are promising tools for assisting obstetricians, improving diagnostic accuracy, and enhancing obstetric patient care.
准确解读胎心监护(CTG)对于孕期和分娩期间监测胎儿健康至关重要。先进的人工智能(AI)工具,如人工智能大语言模型(AI-LLMs),可能会提高CTG解读的准确性,但其潜力尚未得到广泛评估。
本研究旨在评估三种AI-LLMs(ChatGPT-4o、Gemini Advanced和Copilot)在CTG图像解读中的表现,将其结果与初级(JHDs)和高级人类医生(SHDs)的结果进行比较,并评估它们在临床决策中的可靠性。
由三种AI-LLMs、五名SHDs和五名JHDs对七张CTG图像进行解读,由五名不知情的母胎医学专家使用李克特量表对五个参数(相关性、清晰度、深度、重点和连贯性)进行评分。对专家评分和组间表现的同质性进行统计学比较。
ChatGPT-4o得分为77.86,优于Gemini Advanced(57.14)、Copilot(47.29)和JHDs(61.57)。其表现与SHDs(80.43)相近,两者之间无统计学显著差异(p > 0.05)。ChatGPT-4o在深度参数方面表现出色,在其他参数方面仅略逊于SHDs。
ChatGPT-4o在AI-LLMs中表现卓越,在CTG解读方面超过了JHDs,与SHDs的表现水平相近。AI-LLMs,尤其是ChatGPT-4o,是协助产科医生、提高诊断准确性和加强产科患者护理的有前途的工具。