Yassa Arsany, Akhavan Arya A, Ayad Solina, Ayad Olivia
Division of Plastic and Reconstructive Surgery, Rutgers New Jersey Medical School, Newark, New Jersey.
Founder and Researcher, Arclivia, a platform for innovation and research in AI integration.
Eplasty. 2025 Jan 23;25:e3. eCollection 2025.
Online before-and-after photos commonly guide patient expectations in body contouring surgeries. However, recent artificial intelligence (AI) advancements allow for lifelike "photos" of hypothetical individuals, which patients can use in their decision-making. The accuracy of AI models, trained on divergent image sets, in showing realistic figures, cosmetic defects, and surgical outcomes is questionable. This study sought to evaluate the quality of these images.
We utilized AI platforms GetIMG, Leonardo, and Perchance to create pre- and post-surgery visuals for abdominoplasty and buttock augmentation. Expert board-certified plastic surgeons and plastic surgery residents assessed the images across 11 criteria, focusing on realism and clinical value. ANOVA and Tukey honestly significant difference post-hoc tests were executed for data analysis.
Realism and clinical value scores among AI models (mean ± standard deviation) were not significantly different, indicating comparable performance (GetIMG 3.83 ± 0.81, Leonardo 3.30 ± 0.69, Perchance 2.68 ± 0.77; > .05). Perchance significantly underperformed in size and volume accuracy ( = .02) and pathological feature recognition ( = .01 and .03). No consistent underperforming metric was identified when evaluated. The phenomenon of the "uncanny valley" was also identified.
Despite some realistic and accurate surgical predictions, most AI-generated images were anatomically unrealistic, demonstrated inaccurate postoperative results, and invoked the "uncanny valley" effect. Given the uniformly poor performance, patients should avoid using these images for surgical decisions due to the potential of unrealistic expectations. Surgeons are advised to use real patient photos for consultations. Future research aims to compare AI images with actual before-and-after photos and include a bigger pool of experts for evaluation.
前后对比照片通常用于引导患者对身体塑形手术的期望。然而,最近人工智能(AI)的进步使得能够生成逼真的虚拟人物“照片”,患者可在决策过程中使用这些照片。在不同图像集上训练的AI模型在展示逼真的体型、美容缺陷和手术效果方面的准确性值得怀疑。本研究旨在评估这些图像的质量。
我们利用AI平台GetIMG、Leonardo和Perchance创建腹部整形术和臀部增大术的术前和术后视觉效果。经委员会认证的整形外科专家和整形外科住院医师根据11项标准对图像进行评估,重点关注逼真度和临床价值。采用方差分析和Tukey事后检验进行数据分析。
AI模型之间的逼真度和临床价值评分(均值±标准差)无显著差异,表明性能相当(GetIMG 3.83±0.81,Leonardo 3.30±0.69,Perchance 2.68±0.77;P>.05)。Perchance在尺寸和体积准确性(P =.02)以及病理特征识别(P =.01和.03)方面表现明显较差。评估时未发现一致的表现不佳指标。还发现了“恐怖谷”现象。
尽管有一些现实且准确的手术预测,但大多数AI生成的图像在解剖学上不现实,术后结果不准确,并引发了“恐怖谷”效应。鉴于表现普遍不佳,由于可能产生不切实际的期望,患者应避免使用这些图像进行手术决策。建议外科医生在咨询时使用真实患者照片。未来的研究旨在将AI图像与实际的前后照片进行比较,并纳入更多专家进行评估。