Department of Nursing, Mackay Medical College, Taipei, Taiwan.
Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan.
J Med Internet Res. 2023 Dec 25;25:e51229. doi: 10.2196/51229.
ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers.
We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research.
We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content.
The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference -4.33; 95% CI -4.79 to -3.86; P<.001) but minimal in the 4-subheading structured format (mean difference -2.33; 95% CI -2.79 to -1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT.
Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%.
ChatGPT 可以作为研究助理,帮助组织思维方向并总结研究发现。然而,很少有研究检查当研究人员提供基础研究全文论文时,ChatGPT 生成的摘要的质量、相似性(摘要与原始摘要相似)和准确性。
我们旨在评估人工智能 (AI) 模型在生成基础临床前研究摘要方面的适用性。
我们从《自然》、《基因组生物学》和《生物精神病学》中选择了 30 篇基础研究论文。我们排除摘要,将全文输入到基于 ChatGPT 的语言模型应用程序 ChatPDF 中,并提示它以与原始论文相同的风格生成摘要。总共邀请了 8 位专家使用盲法评估这些摘要的质量(基于 0-10 的李克特量表),并识别哪些摘要由 ChatPDF 生成。这些摘要还评估了它们与原始摘要的相似性和 AI 内容的准确性。
ChatGPT 生成的摘要质量低于实际摘要(10 点李克特量表:平均值 4.72,标准差 2.09 与平均值 8.09,标准差 1.03;P<.001)。在非结构化格式中,质量差异具有统计学意义(平均差异 -4.33;95%CI -4.79 至 -3.86;P<.001),而在 4 个子标题结构化格式中差异极小(平均差异 -2.33;95%CI -2.79 至 -1.86)。在 30 个 ChatGPT 生成的摘要中,有 3 个显示错误结论,有 10 个被确定为 AI 内容。原始和生成摘要之间的相似百分比不高(2.10%-4.40%)。盲审员在猜测哪些摘要使用 ChatGPT 编写方面的准确率达到 93%(224/240)。
使用 ChatGPT 生成科学摘要在使用人类撰写的真实全文时可能不会导致相似性问题。然而,ChatGPT 生成的摘要质量不尽如人意,其准确性并非 100%。