评估 ChatGPT-4 在妊娠期间甲状腺功能减退症相关问题的回复的可靠性和可读性。
Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy.
机构信息
Department of Endocrinology and Metabolic Diseases, Ankara Training and Research Hospital, Ankara, Turkey.
出版信息
Sci Rep. 2024 Jan 2;14(1):243. doi: 10.1038/s41598-023-50884-w.
Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2-4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00-37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.
甲状腺功能减退症的特征是甲状腺激素缺乏,对妊娠和胎儿健康均有不良影响。ChatGPT 是一种基于来自多个来源的大型数据库进行训练的大型语言模型。我们的研究旨在评估 ChatGPT-4 对妊娠甲状腺功能减退症相关问题回答的可靠性和可读性。根据美国甲状腺协会(ATA)最新指南中关于妊娠甲状腺功能减退症的建议,共创建了 19 个问题并向 ChatGPT-4 提问。两名独立研究人员使用全球质量评分(GQS)和改良的 DISCERN 工具对回答的可靠性和质量进行评分。使用弗莱什阅读容易度(FRE)评分、弗莱什-金凯德年级水平(FKGL)、古宁 Fog 指数(GFI)、科尔曼-廖指数(CLI)和简单的模糊度测量(SMOG)工具评估 ChatGPT 的可读性。在任何回答中都没有发现误导性信息。回答的平均 mDISCERN 评分为 30.26±3.14;中位数 GQS 评分为 4(2-4)。在可靠性方面,大多数回答显示中度(78.9%),其次是良好(21.1%)。在可读性分析中,中位数 FRE 为 32.20(13.00-37.10)。阅读答案所需的教育年限大多在大学水平[9(47.3%)]。虽然 ChatGPT-4 具有很大的潜力,但它可以通过在患者和临床医生之间建立关于妊娠甲状腺功能减退症的沟通桥梁,作为咨询的辅助信息来源。应努力提高 ChatGPT 的可靠性和可读性。