Division of Plastic, Reconstructive, and Aesthetic Surgery, McGill University Health Centre, Montreal, QC, Canada.
Brigham and Women's Hospital, Harvard Medical School, Boston Massachusetts, USA.
Aesthetic Plast Surg. 2024 Mar;48(5):953-976. doi: 10.1007/s00266-023-03819-9. Epub 2024 Jan 25.
Large language models (LLM) have revolutionized the way humans interact with artificial intelligence (AI) technology, with marked potential for applications in esthetic surgery. The present study evaluates the performance of Bard, a novel LLM, in identifying and managing postoperative patient concerns for complications following body contouring surgery.
The American Society of Plastic Surgeons' website was queried to identify and simulate all potential postoperative complications following body contouring across different acuities and severity. Bard's accuracy was assessed in providing a differential diagnosis, soliciting a history, suggesting a most-likely diagnosis, appropriate disposition, treatments/interventions to begin from home, and red-flag signs/symptoms indicating deterioration, or requiring urgent emergency department (ED) presentation.
Twenty-two simulated body contouring complications were examined. Overall, Bard demonstrated a 59% accuracy in listing relevant diagnoses on its differentials, with a 52% incidence of incorrect or misleading diagnoses. Following history-taking, Bard demonstrated an overall accuracy of 44% in identifying the most-likely diagnosis, and a 55% accuracy in suggesting the indicated medical dispositions. Helpful treatments/interventions to begin from home were suggested with a 40% accuracy, whereas red-flag signs/symptoms, indicating deterioration, were shared with a 48% accuracy. A detailed analysis of performance, stratified according to latency of postoperative presentation (<48hours, 48hours-1month, or >1month postoperatively), and according to acuity and indicated medical disposition, is presented herein.
Despite promising potential of LLMs and AI in healthcare-related applications, Bard's performance in the present study significantly falls short of accepted clinical standards, thus indicating a need for further research and development prior to adoption.
This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
大型语言模型 (LLM) 彻底改变了人类与人工智能 (AI) 技术的交互方式,在美容外科领域具有显著的应用潜力。本研究评估了 Bard,一种新型 LLM,在识别和管理身体轮廓整形手术后并发症患者的术后问题方面的性能。
通过查询美国整形外科学会网站,确定并模拟了身体轮廓整形术后不同严重程度和不同时间出现的所有潜在并发症。评估 Bard 在提供鉴别诊断、采集病史、提示最可能的诊断、适当的处置、建议从家中开始的治疗/干预措施以及提示恶化或需要紧急去急诊就诊的危险信号/症状方面的准确性。
检查了 22 种模拟的身体轮廓整形并发症。总体而言, Bard 在列出不同的鉴别诊断方面的准确率为 59%,错误或误导性诊断的发生率为 52%。在进行病史采集后, Bard 识别最可能的诊断的准确率总体为 44%,建议医疗处置的准确率为 55%。建议从家中开始的有用治疗/干预措施的准确率为 40%,而提示恶化的危险信号/症状的准确率为 48%。根据术后表现的潜伏期(<48 小时、48 小时-1 个月或>1 个月)以及严重程度和建议的医疗处置,对性能进行了详细分析,并在此处进行了介绍。
尽管大型语言模型和 AI 在医疗保健相关应用中具有巨大潜力,但 Bard 在本研究中的表现远未达到公认的临床标准,因此在采用之前需要进一步研究和开发。
证据等级 IV:本杂志要求作者为每篇文章分配一个证据等级。有关这些循证医学等级的完整描述,请参考目录或在线作者指南 www.springer.com/00266 。