Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Int J Pediatr Otorhinolaryngol. 2024 May;180:111957. doi: 10.1016/j.ijporl.2024.111957. Epub 2024 Apr 16.
This paper evaluates ChatGPT's accuracy and consistency in providing information on ankyloglossia, a congenital oral condition. Assessing alignment with expert consensus, the study explores potential implications for patients relying on AI for medical information.
Statements from the 2020 clinical consensus statement on ankyloglossia were presented to ChatGPT, and its responses were scored using a 9-point Likert scale. The study analyzed the mean and standard deviation of ChatGPT scores for each statement. Statistical analysis was conducted using Excel.
Among the 63 statements assessed, 67 % of ChatGPT responses closely aligned with expert consensus mean scores. However, 17 % (11/63) were statements in which the ChatGPT mean response was different from the CCS mean by 2.0 or greater, raising concerns about ChatGPT's potential influence in disseminating uncertain or debated medical information. Variations in mean scores highlighted discrepancies, with some statements showing significant deviations from expert opinions.
While ChatGPT mirrored medical viewpoints on ankyloglossia, alignment with non-consensus statements raises caution in relying on it for medical advice. Future research should refine AI models, address inaccuracies, and explore diverse user queries for safe integration into medical decision-making. Despite potential benefits, ongoing examination of ChatGPT's power and limitations is crucial, considering its impact on health equity and information access.
本文评估了 ChatGPT 在提供关于舌系带过紧(一种先天性口腔状况)信息方面的准确性和一致性。评估与专家共识的一致性,探讨了患者依赖 AI 获得医学信息的潜在影响。
将 2020 年舌系带过紧临床共识声明中的陈述提供给 ChatGPT,并使用 9 分李克特量表对其回复进行评分。本研究分析了每个陈述的 ChatGPT 评分的平均值和标准差。使用 Excel 进行统计分析。
在评估的 63 个陈述中,67%的 ChatGPT 回复与专家共识平均得分密切一致。然而,17%(11/63)的陈述中,ChatGPT 的平均回复与 CCS 的平均回复相差 2.0 或更大,这引发了对 ChatGPT 在传播不确定或有争议的医学信息方面的潜在影响的担忧。平均分数的差异突显了差异,一些陈述与专家意见存在显著偏差。
虽然 ChatGPT 反映了舌系带过紧的医学观点,但与非共识陈述的一致性表明,在提供医学建议时应谨慎使用它。未来的研究应该改进 AI 模型,解决不准确的问题,并探索多样化的用户查询,以安全地将其纳入医学决策制定。尽管具有潜在的益处,但需要持续审查 ChatGPT 的能力和局限性,因为它会对健康公平和信息获取产生影响。