Baylor University Medical Center, Dallas, TX, USA.
Texas A&M College of Medicine, Dallas, TX, USA.
Int J Dermatol. 2024 Nov;63(11):1592-1598. doi: 10.1111/ijd.17382. Epub 2024 Aug 9.
Artificial intelligence (AI) and large language models (LLMs) transform how patients inform themselves. LLMs offer potential as educational tools, but their quality depends upon the information generated. Current literature examining AI as an informational tool in dermatology has been limited in evaluating AI's multifaceted roles and diversity of opinions. Here, we evaluate LLMs as a patient-educational tool for Mohs micrographic surgery (MMS) in and out of the clinic utilizing an international expert panel.
The most common patient MMS questions were extracted from Google and transposed into two LLMs and Google's search engine. 15 MMS surgeons evaluated the generated responses, examining their appropriateness as a patient-facing informational platform, sufficiency of response in a clinical environment, and accuracy of content generated. Validated scales were employed to assess the comprehensibility of each response.
The majority of reviewers deemed all LLM responses appropriate. 75% of responses were rated as mostly accurate or higher. ChatGPT had the highest mean accuracy. The majority of the panel deemed 33% of responses sufficient for clinical practice. The mean comprehensibility scores for all platforms indicated a required 10th-grade reading level.
LLM-generated responses were rated as appropriate patient informational sources and mostly accurate in their content. However, these platforms may not provide sufficient information to function in a clinical environment, and complex comprehensibility may represent a barrier to utilization. As the popularity of these platforms increases, it is important for dermatologists to be aware of these limitations.
人工智能(AI)和大型语言模型(LLM)改变了患者获取信息的方式。LLM 作为教育工具具有一定的潜力,但它们的质量取决于生成的信息。目前,评估 AI 在皮肤科作为信息工具的文献有限,无法充分评估 AI 的多方面作用和多样化的观点。在此,我们利用国际专家小组评估 LLM 作为 Mohs 显微外科手术(MMS)在诊所内外的患者教育工具。
从 Google 中提取最常见的患者 MMS 问题,并将其转换为两个 LLM 和 Google 的搜索引擎。15 名 MMS 外科医生评估生成的回复,考察其作为面向患者的信息平台的适当性、在临床环境中的回复充分性以及生成内容的准确性。采用经过验证的量表来评估每个回复的理解度。
大多数评审员认为所有 LLM 回复都是合适的。75%的回复被评为基本准确或更高。ChatGPT 的准确率最高。大多数专家小组认为 33%的回复足以满足临床实践的需要。所有平台的平均理解度得分表明需要 10 年级的阅读水平。
LLM 生成的回复被评为合适的患者信息来源,其内容基本准确。然而,这些平台在临床环境中可能无法提供足够的信息,并且复杂的理解度可能是利用的障碍。随着这些平台的普及,皮肤科医生了解这些限制非常重要。