Giavina Bianchi Mara, D'adario Andrew, Giavina Bianchi Pedro, Machado Birajara Soares
Big Data Department, Faculdade Israelita de Ciências da Saúde Albert Einstein, São Paulo, Brazil.
Department of Clinical Immunology and Allergy, Universidade de São Paulo, São Paulo, Brazil.
J Allergy Clin Immunol Glob. 2024 Nov 26;4(1):100373. doi: 10.1016/j.jacig.2024.100373. eCollection 2025 Feb.
The use of artificial intelligence (AI) in scientific writing is rapidly increasing, raising concerns about authorship identification, content quality, and writing efficiency.
This study investigates the real-world impact of ChatGPT, a large language model, on those aspects in a simulated publication scenario.
Forty-eight individuals representing 3 medical expertise levels (medical students, residents, and experts in allergy or dermatology) evaluated 3 blinded versions of an atopic dermatitis case report: one each human written (HUM), AI generated (AI), and combined written (COM). The survey assessed authorship, ranked their preference, and graded 13 quality criteria for each text. Time taken to generate each manuscript was also recorded.
Authorship identification accuracy mirrored the odds at 33%. Expert participants (50.9%) demonstrated significantly higher accuracy compared to residents (27.7%) and students (19.6%, < .001). Participants favored AI-assisted versions (AI and COM) over HUM ( < .001), with COM receiving the highest quality scores. COM and AI achieved 83.8% and 84.3% reduction in writing time, respectively, compared to HUM, while showing 13.9% ( < .001) and 11.1% improvement in quality ( < .001), respectively. However, experts assigned the lowest score for the references of the AI manuscript, potentially hindering its publication.
AI can deceptively mimic human writing, particularly for less experienced readers. Although AI-assisted writing is appealing and offers significant time savings, human oversight remains crucial to ensure accuracy, ethical considerations, and optimal quality. These findings underscore the need for transparency in AI use and highlight the potential of human-AI collaboration in the future of scientific writing.
人工智能(AI)在科学写作中的应用正在迅速增加,引发了对作者身份识别、内容质量和写作效率的担忧。
本研究在模拟出版场景中调查大型语言模型ChatGPT对这些方面的实际影响。
48名代表3个医学专业水平的个体(医学生、住院医师以及过敏或皮肤科专家)对一份特应性皮炎病例报告的3个盲法版本进行评估:一个由人类撰写(HUM),一个由AI生成(AI),一个由两者结合撰写(COM)。该调查评估了作者身份,对他们的偏好进行了排序,并对每篇文本的13项质量标准进行了评分。还记录了生成每份手稿所需的时间。
作者身份识别准确率为33%。与住院医师(27.7%)和学生(19.6%,P < .001)相比,专家参与者(50.9%)的准确率显著更高。参与者更喜欢AI辅助版本(AI和COM)而非人类撰写版本(P < .001),COM获得了最高质量分数。与HUM相比,COM和AI的写作时间分别减少了83.8%和84.3%;同时,质量分别提高了13.9%(P < .001)和11.1%(P < .001)。然而,专家对AI手稿参考文献给出的分数最低,这可能会阻碍其发表。
AI可以欺骗性地模仿人类写作,尤其是对于经验不足的读者而言。尽管AI辅助写作很有吸引力且能显著节省时间,但人工监督对于确保准确性、伦理考量和最佳质量仍然至关重要。这些发现强调了AI使用中透明度的必要性,并凸显了人机协作在未来科学写作中的潜力。