CITRIS Health, University of California Berkeley, Berkeley, CA 94720-1764, United States.
Laboratory of Vascular and Matrix Genetics, National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD 20892, United States.
J Am Med Inform Assoc. 2024 Oct 1;31(10):2271-2283. doi: 10.1093/jamia/ocae128.
To evaluate the efficacy of ChatGPT 4 (GPT-4) in delivering genetic information about BRCA1, HFE, and MLH1, building on previous findings with ChatGPT 3.5 (GPT-3.5). To focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings.
A structured survey was developed to assess GPT-4's clinical value. An expert panel of genetic counselors and clinical geneticists evaluated GPT-4's responses to these questions. We also performed comparative analysis with GPT-3.5, utilizing descriptive statistics and using Prism 9 for data analysis.
The findings indicate improved accuracy in GPT-4 over GPT-3.5 (P < .0001). However, notable errors in accuracy remained. The relevance of responses varied in GPT-4, but was generally favorable, with a mean in the "somewhat agree" range. There was no difference in performance by disease category. The 7-question subset of the Bot Usability Scale (BUS-15) showed no statistically significant difference between the groups but trended lower in the GPT-4 version.
The study underscores GPT-4's potential role in genetic education, showing notable progress yet facing challenges like outdated information and the necessity of ongoing refinement. Our results, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery.
评估 ChatGPT 4(GPT-4)在传递 BRCA1、HFE 和 MLH1 遗传信息方面的效果,这是基于之前对 ChatGPT 3.5(GPT-3.5)的研究结果。重点评估在医疗环境中使用 ChatGPT 的实用性、局限性和伦理影响。
开发了一个结构化调查来评估 GPT-4 的临床价值。一个由遗传咨询师和临床遗传学家组成的专家小组评估了 GPT-4 对这些问题的回答。我们还使用描述性统计和 Prism 9 进行数据分析,对 GPT-3.5 进行了比较分析。
研究结果表明 GPT-4 的准确性优于 GPT-3.5(P < .0001)。然而,准确性仍存在明显错误。GPT-4 的响应相关性有所不同,但总体上是有利的,平均处于“有些同意”的范围。疾病类别对性能没有影响。Bot 可用性量表(BUS-15)的 7 个问题子集在两组之间没有统计学上的显著差异,但在 GPT-4 版本中呈下降趋势。
该研究强调了 GPT-4 在遗传教育中的潜在作用,显示出显著的进展,但仍面临挑战,如信息过时和需要不断改进。我们的结果虽然有希望,但强调了在医疗保健信息传递中平衡技术创新与伦理责任的重要性。