Karakash William J, Avetisian Henry, Ragheb Jonathan M, Wang Jeffrey C, Hah Raymond J, Alluri Ram K
Department of Orthopaedic Surgery, Keck School of Medicine at the University of Southern California, Los Angeles, CA, USA.
Department of Orthopaedic Surgery, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, USA.
Global Spine J. 2025 May 20:21925682251344248. doi: 10.1177/21925682251344248.
Study DesignA comparative analysis of AI-generated vs human-authored personal statements for spine surgery fellowship applications.ObjectiveTo assess whether evaluators could differentiate between ChatGPT- and human-authored personal statements and determine if AI-generated statements could outperform human-authored ones in quality metrics.Summary of Background DataPersonal statements are key in fellowship admissions, but the rise of AI tools like ChatGPT raises concerns about their use. While previous studies have examined AI-generated residency statements, their role in spine fellowship applications remains unexplored.MethodsNine personal statements (4 ChatGPT-generated, 5 human-authored) were evaluated by 8 blinded reviewers (6 attending spine surgeons and 2 fellows). ChatGPT-4o was prompted to create statements focused on 4 unique experiences. Evaluators rated each for readability, originality, quality, and authenticity (0-100 scale), determined AI authorship, and indicated interview recommendations.ResultsChatGPT-authored statements scored higher in readability (65.69 vs 56.40, = 0.016) and quality (63.00 vs 51.80, = 0.004) but showed no differences in originality ( = 0.339) or authenticity ( = 0.256). Reviewers could not reliably distinguish AI from human authorship ( = 1.000). Interview recommendations favored ChatGPT-generated statements (84.4% vs 62.5%, OR: 3.24 [1.08-11.17], = 0.045).ConclusionChatGPT can produce high quality, indistinguishable spine fellowship personal statements that increase interview likelihood. These findings highlight the need for nuanced guidelines regarding AI use in application processes, particularly considering its potential role in expanding access to high-quality writing assistance and editing.
研究设计
对用于脊柱外科 fellowship 申请的人工智能生成的与人类撰写的个人陈述进行比较分析。
目的
评估评估者是否能够区分由 ChatGPT 生成的和人类撰写的个人陈述,并确定人工智能生成的陈述在质量指标上是否能优于人类撰写的陈述。
背景数据总结
个人陈述在 fellowship 录取中至关重要,但像 ChatGPT 这样的人工智能工具的兴起引发了对其使用的担忧。虽然之前的研究已经考察了人工智能生成的住院医师陈述,但它们在脊柱 fellowship 申请中的作用仍未得到探索。
方法
9 份个人陈述(4 份由 ChatGPT 生成,5 份由人类撰写)由 8 名盲评人员(6 名脊柱外科主治医生和 2 名住院医师)进行评估。ChatGPT-4o 被促使生成专注于 4 种独特经历的陈述。评估者对每份陈述的可读性、原创性、质量和真实性(0 - 100 分制)进行评分,确定人工智能作者身份,并给出面试建议。
结果
由 ChatGPT 撰写的陈述在可读性(65.69 对 56.40,P = 0.016)和质量(63.00 对 51.80,P = 0.004)方面得分更高,但在原创性(P = 0.339)或真实性(P = 0.256)方面没有差异。评审人员无法可靠地区分人工智能与人类作者身份(P = 1.000)。面试建议更倾向于 ChatGPT 生成的陈述(84.4% 对 62.5%,优势比:3.24 [1.08 - 11.17],P = 0.045)。
结论
ChatGPT 可以生成高质量、难以区分的脊柱 fellowship 个人陈述,从而增加面试可能性。这些发现凸显了在申请过程中需要关于人工智能使用的细致入微的指导方针,特别是考虑到其在扩大获得高质量写作帮助和编辑方面的潜在作用。