Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California.
J Reconstr Microsurg. 2024 Nov;40(9):657-664. doi: 10.1055/a-2273-4163. Epub 2024 Feb 21.
BACKGROUND: With the growing relevance of artificial intelligence (AI)-based patient-facing information, microsurgical-specific online information provided by professional organizations was compared with that of ChatGPT (Chat Generative Pre-Trained Transformer) and assessed for accuracy, comprehensiveness, clarity, and readability. METHODS: Six plastic and reconstructive surgeons blindly assessed responses to 10 microsurgery-related medical questions written either by the American Society of Reconstructive Microsurgery (ASRM) or ChatGPT based on accuracy, comprehensiveness, and clarity. Surgeons were asked to choose which source provided the overall highest-quality microsurgical patient-facing information. Additionally, 30 individuals with no medical background (ages: 18-81, μ = 49.8) were asked to determine a preference when blindly comparing materials. Readability scores were calculated, and all numerical scores were analyzed using the following six reliability formulas: Flesch-Kincaid Grade Level, Flesch-Kincaid Readability Ease, Gunning Fog Index, Simple Measure of Gobbledygook Index, Coleman-Liau Index, Linsear Write Formula, and Automated Readability Index. Statistical analysis of microsurgical-specific online sources was conducted utilizing paired -tests. RESULTS: Statistically significant differences in comprehensiveness and clarity were seen in favor of ChatGPT. Surgeons, 70.7% of the time, blindly choose ChatGPT as the source that overall provided the highest-quality microsurgical patient-facing information. Nonmedical individuals 55.9% of the time selected AI-generated microsurgical materials as well. Neither ChatGPT nor ASRM-generated materials were found to contain inaccuracies. Readability scores for both ChatGPT and ASRM materials were found to exceed recommended levels for patient proficiency across six readability formulas, with AI-based material scored as more complex. CONCLUSION: AI-generated patient-facing materials were preferred by surgeons in terms of comprehensiveness and clarity when blindly compared with online material provided by ASRM. Studied AI-generated material was not found to contain inaccuracies. Additionally, surgeons and nonmedical individuals consistently indicated an overall preference for AI-generated material. A readability analysis suggested that both materials sourced from ChatGPT and ASRM surpassed recommended reading levels across six readability scores.
背景:随着人工智能(AI)为基础的面向患者的信息日益受到关注,我们将专业组织提供的特定于显微外科的在线信息与 ChatGPT(Chat Generative Pre-Trained Transformer)进行比较,并评估其准确性、全面性、清晰度和可读性。
方法:六位整形和重建外科医生分别对由美国重建显微外科学会(ASRM)或 ChatGPT 编写的 10 个与显微外科相关的医学问题的回答进行了盲法评估,根据准确性、全面性和清晰度进行评估。外科医生被要求选择哪个来源提供整体质量最高的显微外科患者信息。此外,还要求 30 名无医学背景的个体(年龄:18-81 岁,μ=49.8)在盲法比较材料时确定偏好。计算了可读性评分,并使用以下六种可靠性公式分析了所有数值评分:Flesch-Kincaid 年级水平、Flesch-Kincaid 易读性、Gunning Fog 指数、简单混杂度指数、Coleman-Liau 指数、Linsear 写作公式和自动可读性指数。利用配对 t 检验对特定于显微外科的在线资源进行了统计学分析。
结果:在全面性和清晰度方面,ChatGPT 具有统计学显著优势。外科医生有 70.7%的时间盲目选择 ChatGPT 作为提供整体质量最高的显微外科患者信息的来源。非医学个体有 55.9%的时间也选择了人工智能生成的显微外科材料。ChatGPT 和 ASRM 生成的材料都没有发现不准确的地方。使用六种可读性公式,ChatGPT 和 ASRM 材料的可读性评分均高于患者熟练程度的推荐水平,而基于人工智能的材料评分则更为复杂。
结论:在与 ASRM 提供的在线材料进行盲目比较时,外科医生更喜欢人工智能生成的面向患者的材料,因为其全面性和清晰度。研究中人工智能生成的材料没有发现不准确的地方。此外,外科医生和非医学个体始终表示对人工智能生成的材料总体偏好。可读性分析表明,两种材料均来自 ChatGPT 和 ASRM,在六种可读性评分中均超过了推荐的阅读水平。
J Reconstr Microsurg. 2024-11
Cleft Palate Craniofac J. 2024-8-1
J Med Internet Res. 2024-8-14
JAMA Ophthalmol. 2015-4
Sci Rep. 2024-7-26
Plast Reconstr Surg Glob Open. 2024-11-20
JPRAS Open. 2024-9-14