Nathani Karim Rizwan, Nathani Ali-Muhammad, Delawan Maliya, Safdar Aleeza, Bydon Mohamad
1Neuro-Informatics Laboratory, Department of Neurologic Surgery, Mayo Clinic, Rochester.
2Department of Neurologic Surgery, Mayo Clinic, Rochester; and.
J Neurosurg Spine. 2025 Aug 22:1-6. doi: 10.3171/2025.4.SPINE25519.
Artificial intelligence (AI) is increasingly capable of academic writing, with large language models such as ChatGPT showing potential to assist or even generate scientific manuscripts. However, concerns remain regarding the quality, reliability, and interpretive capabilities of AI-generated content. The authors' study aimed to compare the quality of a human-written versus an AI-generated scientific manuscript to evaluate the strengths and limitations of AI in the context of academic publishing.
Two manuscripts were developed using identical titles, abstracts, and tables of a simulated analysis: one authored by a physician with multiple publications, and the other generated by ChatGPT-4o. Three independent and blinded reviewers-two human and one AI-assessed each manuscript across five domains: clarity and readability, coherence and flow, technical accuracy, depth, and conciseness and precision. Each category was scored on a 10-point scale, and qualitative feedback was collected to highlight specific strengths and weaknesses. Additionally, all reviewers were asked to deduce authorship of the manuscripts.
The AI-generated manuscript scored higher in clarity and readability (mean 9.0 vs 7.2), but lower in technical accuracy (mean 6.3 vs 9.3) and depth (mean 5.5 vs 7.5). However, reviewers noted that the AI version lacked depth, critical analysis, and contextual interpretation. All reviewers accurately identified the authorship of each manuscript and tended to rate the version more favorably when it aligned with their own origin (human or AI); i.e., human reviewers assigned higher scores to the human-written manuscript, while the AI reviewer scored the AI-generated manuscript higher.
Although AI models can improve some aspects of scientific writing, particularly clarity and readability, they fall short in critical reasoning and contextual understanding. This reinforces the importance of human authorship and oversight in maintaining the critical analysis and scientific accuracy essential for academic publishing. AI may be used as a complementary tool to support, rather than replace, human-led scientific writing.
人工智能(AI)在学术写作方面的能力日益增强,像ChatGPT这样的大型语言模型显示出协助甚至生成科学手稿的潜力。然而,人们对人工智能生成内容的质量、可靠性和解释能力仍存在担忧。作者的研究旨在比较人工撰写的与人工智能生成的科学手稿的质量,以评估人工智能在学术出版背景下的优势和局限性。
使用相同的标题、摘要和模拟分析表格撰写两篇手稿:一篇由一位有多篇出版物的医生撰写,另一篇由ChatGPT-4o生成。三位独立且不知情的评审员——两位人类评审员和一位人工智能评审员——在五个领域对每篇手稿进行评估:清晰度和可读性、连贯性和流畅性、技术准确性、深度以及简洁性和精确性。每个类别在10分制上进行评分,并收集定性反馈以突出具体的优势和劣势。此外,所有评审员都被要求推断手稿的作者身份。
人工智能生成的手稿在清晰度和可读性方面得分更高(平均9.0分对7.2分),但在技术准确性(平均6.3分对9.3分)和深度(平均5.5分对7.5分)方面得分较低。然而,评审员指出人工智能版本缺乏深度、批判性分析和背景解释。所有评审员都准确识别了每篇手稿的作者身份,并且当版本与他们自己的来源(人类或人工智能)一致时,往往会给予更高的评价;也就是说,人类评审员给人工撰写的手稿打分更高,而人工智能评审员给人工智能生成的手稿打分更高。
尽管人工智能模型可以改进科学写作的某些方面,特别是清晰度和可读性,但它们在批判性推理和背景理解方面存在不足。这强化了人类作者身份和监督在维持学术出版至关重要的批判性分析和科学准确性方面的重要性。人工智能可以用作辅助工具来支持而不是取代人类主导的科学写作。