Gandhi Aravind P, Joesph Felista Karen, Rajagopal Vineeth, Aparnavi P, Katkuri Sushma, Dayama Sonal, Satapathy Prakasini, Khatib Mahalaqua Nazli, Gaidhane Shilpa, Zahiruddin Quazi Syed, Behera Ashish
Department of Community Medicine, All India Institute of Medical Sciences, Nagpur, Maharashtra, India.
Melmaruvathur Adhiparasakthi Institute of Medical Sciences and Research, Melmaruvathur, India.
JMIR Form Res. 2024 Mar 25;8:e49964. doi: 10.2196/49964.
Medical students may increasingly use large language models (LLMs) in their learning. ChatGPT is an LLM at the forefront of this new development in medical education with the capacity to respond to multidisciplinary questions.
The aim of this study was to evaluate the ability of ChatGPT 3.5 to complete the Indian undergraduate medical examination in the subject of community medicine. We further compared ChatGPT scores with the scores obtained by the students.
The study was conducted at a publicly funded medical college in Hyderabad, India. The study was based on the internal assessment examination conducted in January 2023 for students in the Bachelor of Medicine and Bachelor of Surgery Final Year-Part I program; the examination of focus included 40 questions (divided between two papers) from the community medicine subject syllabus. Each paper had three sections with different weightage of marks for each section: section one had two long essay-type questions worth 15 marks each, section two had 8 short essay-type questions worth 5 marks each, and section three had 10 short-answer questions worth 3 marks each. The same questions were administered as prompts to ChatGPT 3.5 and the responses were recorded. Apart from scoring ChatGPT responses, two independent evaluators explored the responses to each question to further analyze their quality with regard to three subdomains: relevancy, coherence, and completeness. Each question was scored in these subdomains on a Likert scale of 1-5. The average of the two evaluators was taken as the subdomain score of the question. The proportion of questions with a score 50% of the maximum score (5) in each subdomain was calculated.
ChatGPT 3.5 scored 72.3% on paper 1 and 61% on paper 2. The mean score of the 94 students was 43% on paper 1 and 45% on paper 2. The responses of ChatGPT 3.5 were also rated to be satisfactorily relevant, coherent, and complete for most of the questions (>80%).
ChatGPT 3.5 appears to have substantial and sufficient knowledge to understand and answer the Indian medical undergraduate examination in the subject of community medicine. ChatGPT may be introduced to students to enable the self-directed learning of community medicine in pilot mode. However, faculty oversight will be required as ChatGPT is still in the initial stages of development, and thus its potential and reliability of medical content from the Indian context need to be further explored comprehensively.
医学生在学习中可能越来越多地使用大语言模型(LLMs)。ChatGPT是大语言模型,处于医学教育这一新发展的前沿,有能力回答多学科问题。
本研究旨在评估ChatGPT 3.5完成印度本科医学社区医学学科考试的能力。我们还将ChatGPT的分数与学生获得的分数进行了比较。
该研究在印度海得拉巴一所公立医学院进行。该研究基于对医学学士和外科学士最后一年第一部分课程的学生于2023年1月进行的内部评估考试;重点考试包括社区医学学科大纲中的40道题(分在两篇试卷中)。每篇试卷有三个部分,每个部分的分值权重不同:第一部分有两道长篇论文类型的问题,每题15分,第二部分有8道短篇论文类型的问题,每题5分,第三部分有10道简答题,每题3分。将相同的问题作为提示输入ChatGPT 3.5,并记录其回答。除了对ChatGPT的回答进行评分外,两名独立评估人员对每个问题的回答进行了探究,以进一步从相关性、连贯性和完整性三个子领域分析其质量。每个问题在这些子领域中按照1-5的李克特量表进行评分。取两名评估人员评分的平均值作为该问题的子领域得分。计算每个子领域中得分达到满分(5分)50%的问题比例。
ChatGPT 3.5在试卷1上的得分为72.3%,在试卷2上的得分为61%。94名学生的平均得分在试卷1上为43%,在试卷2上为45%。对于大多数问题(>80%),ChatGPT 3.5的回答在相关性、连贯性和完整性方面也被评为令人满意。
ChatGPT 3.5似乎拥有大量且足够的知识来理解和回答印度医学本科社区医学学科的考试。可以向学生引入ChatGPT,以便在试点模式下实现社区医学的自主学习。然而,由于ChatGPT仍处于发展初期,因此需要教师监督,其在印度背景下医学内容的潜力和可靠性需要进一步全面探索。