Lin Mien-Jen, Hsieh Li-Chun, Chen Chin-Kuo
Department of Medical Education, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan.
Department of Otolaryngology-Head and Neck Surgery, Mackay Memorial Hospital, Taipei City 10449, Taiwan.
Diagnostics (Basel). 2025 Aug 11;15(16):2006. doi: 10.3390/diagnostics15162006.
: Generative AI (GenAI) models like ChatGPT have gained significant attention in recent years for their potential applications in healthcare. This study evaluates the concordance of responses generated by ChatGPT (versions 3.5 and 4.0) with the key action statements from the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) clinical practice guidelines (CPGs) for Ménière's disease translated into Chinese. : Seventeen questions derived from the KAS were translated into Chinese and posed to ChatGPT versions 3.5 and 4.0. Responses were categorized as correct, partially correct, incorrect, or non-answers. Concordance with the guidelines was evaluated, and Fisher's exact test assessed statistical differences, with significance set at < 0.05. Comparative analysis between ChatGPT 3.5 and 4.0 was performed. : ChatGPT 3.5 demonstrated an 82.4% correctness rate (14 correct, 2 partially correct, 1 non-answer), while ChatGPT 4.0 achieved 94.1% (16 correct, 1 partially correct). Overall, 97.1% of responses were correct or partially correct. ChatGPT 4.0 offered enhanced citation accuracy and text clarity but occasionally included redundant details. No significant difference in correctness rates was observed between the models ( = 0.6012). : Both ChatGPT models showed high concordance with the AAO-HNS CPG for MD, with ChatGPT 4.0 exhibiting superior text clarity and citation accuracy. These findings highlight ChatGPT's potential as a reliable assistant for better healthcare communication and clinical operations. Future research should validate these results across broader medical topics and languages to ensure robust integration of GenAI in healthcare.
近年来,像ChatGPT这样的生成式人工智能(GenAI)模型因其在医疗保健领域的潜在应用而备受关注。本研究评估了ChatGPT(3.5版和4.0版)生成的回答与美国耳鼻咽喉头颈外科学会(AAO-HNS)梅尼埃病临床实践指南(CPG)翻译成中文后的关键行动声明的一致性。
从关键行动声明中衍生出的17个问题被翻译成中文,并向ChatGPT 3.5版和4.0版提出。回答被分类为正确、部分正确、错误或无回答。评估了与指南的一致性,并使用Fisher精确检验评估统计差异,显著性设定为<0.05。对ChatGPT 3.5版和4.0版进行了比较分析。
ChatGPT 3.5版的正确率为82.4%(14个正确,2个部分正确,1个无回答),而ChatGPT 4.0版的正确率为94.1%(16个正确,1个部分正确)。总体而言,97.1%的回答是正确或部分正确的。ChatGPT 4.0版的引用准确性和文本清晰度有所提高,但偶尔会包含冗余细节。两个模型在正确率上没有观察到显著差异(P = 0.6012)。
两个ChatGPT模型与AAO-HNS关于梅尼埃病的CPG都显示出高度一致性,ChatGPT 4.0版在文本清晰度和引用准确性方面表现更优。这些发现凸显了ChatGPT作为改善医疗保健沟通和临床操作的可靠助手的潜力。未来的研究应在更广泛的医学主题和语言中验证这些结果,以确保GenAI在医疗保健中的稳健整合。