Suppr超能文献

将人工智能的新工具与我们全球健康专业学生的真实智能进行比较。

Comparing new tools of artificial intelligence to the authentic intelligence of our global health students.

作者信息

Thandla Shilpa R, Armstrong Grace Q, Menon Adil, Shah Aashna, Gueye David L, Harb Clara, Hernandez Estefania, Iyer Yasaswini, Hotchner Abigail R, Modi Riddhi, Mudigonda Anusha, Prokos Maria A, Rao Tharun M, Thomas Olivia R, Beltran Camilo A, Guerrieri Taylor, LeBlanc Sydney, Moorthy Skanda, Yacoub Sara G, Gardner Jacob E, Greenberg Benjamin M, Hubal Alyssa, Lapina Yuliana P, Moran Jacqueline, O'Brien Joseph P, Winnicki Anna C, Yoka Christina, Zhang Junwei, Zimmerman Peter A

机构信息

Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA.

School of Medicine, Case Western Reserve University, Cleveland, OH, USA.

出版信息

BioData Min. 2024 Dec 18;17(1):58. doi: 10.1186/s13040-024-00408-7.

Abstract

INTRODUCTION

The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.

METHODS

After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.

RESULTS

Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.

DISCUSSION

Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed an important component in using information effectively. Moreover, misrepresenting trusted sources of public health information is highly concerning, particularly given recent experiences from the COVID-19 pandemic and more recently in reporting on the impact of, and response to natural disasters. This is a significant limitation of GenAI's ability to meet information literacy standards expected of graduate students.

摘要

引言

人工智能(AI)的变革性特征在于其具有强大的能力,能够将非结构化数据进行解读并转化为连贯且有意义的内容。总体而言,人工智能改变学生研究及其评估传统方法的潜力似乎很大。在全球健康研究方面,学生和研究专家评估生成式人工智能(GenAI)在此领域的优势和局限性非常重要。因此,我们研究的目的是将GenAI的信息素养与研究生撰写研究论文时应达到的期望进行比较评估。

方法

在凯斯西储大学(CWRU)完成“全球健康基础”(INTH 401)课程后,招募成功完成必修研究论文的研究生,将他们的原创论文与使用原始作业提示由ChatGPT-4o生成的论文进行比较。学生们还完成了一份谷歌表单调查问卷,以评估人工智能生成论文的不同部分(例如,是否符合引言指南、三个观点的阐述、结论)以及他们的原创论文,以及他们对人工智能工作的总体满意度。学生与ChatGPT-4o的原始比较还能够评估叙述元素和参考文献。

结果

在完成必修研究论文的54名学生中,28名(51.8%)同意参与比较项目。调查回复总结表明,学生们认为人工智能生成的论文不如或类似于他们自己的论文(总体满意度平均为2.39(1.61 - 3.17);李克特量表:1至5分,分数越低表明质量越差)。对5个李克特项目问题的学生个人平均回复进行评估显示,17个分数<2.9;7个分数在3.0至3.9之间;4个分数≥4.0,这与人工智能生成的论文质量较差一致。对ChatGPT-4o选择的参考文献(共729条参考文献)进行评估发现,54%(n = 396)是真实的,46%(n = 333)不存在。在真实参考文献中,26.5%(105/396)与论文叙述相关;占729条参考文献总数的14.4%。

讨论

我们的研究结果揭示了人工智能工具在协助理解全球健康主题复杂性方面的优势和局限性。学生提到的优势包括ChatGPT-4o能够非常快速地生成内容,并能在论文的三个观点部分提出他们未曾考虑过的主题。持续呈现最新事实和参考文献,以及进一步审视或总结全球健康主题的复杂性,似乎是ChatGPT-4o目前的一个局限性。由于ChatGPT-4o生成了不存在的来自高度可信生物医学研究期刊的参考文献,我们的研究结果表明ChatGPT-4o在有效利用信息方面未能通过一个重要环节。此外,歪曲公共卫生信息的可靠来源令人高度担忧,特别是考虑到新冠疫情期间的近期经历以及最近在报道自然灾害的影响和应对情况时。这是GenAI在满足研究生应具备的信息素养标准方面的一个重大局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b2a/11656723/9d738cef18f4/13040_2024_408_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验