Suppr超能文献

大海捞针:学生对教师的评价进行自然语言处理能否识别教学问题?

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

作者信息

Dine C Jessica, Shea Judy A, Clancy Caitlin B, Heath Janae K, Pluta William, Kogan Jennifer R

机构信息

Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Gen Intern Med. 2025 Jan;40(1):119-123. doi: 10.1007/s11606-024-08990-6. Epub 2024 Aug 21.

Abstract

BACKGROUND

Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive.

AIM

To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns.

SETTING AND PARTICIPANTS

Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018.

PROGRAM DESCRIPTION

The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary.

PROGRAM EVALUATION

The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37.

DISCUSSION

NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.

摘要

背景

院校依靠学生对教学的评价(SET)来确定教学质量。对叙述性评语进行人工审核能够识别出存在教学问题的教师,但这可能耗费资源和时间。

目的

确定学习者在临床轮转时完成的SET评语的自然语言处理(NLP)能否识别出教学质量问题。

设置与参与者

对2017年7月1日至2018年6月30日期间临床轮转的SET(n = 11,850)进行单机构回顾性队列分析。

方案描述

将研究团队创建的三本NLP词典的性能与一本现成的情感词典进行比较。

方案评估

专家词典的准确率为0.90,精确率为0.62,召回率为0.50。限定词词典的准确率(0.65)和精确率(0.16)较低,但召回率(0.67)相似。文本挖掘词典的准确率为0.78,召回率为0.24。情感加限定词词典的准确率(0.86)和召回率(0.77)良好,精确率为0.37。

讨论

NLP方法能够以较高的准确率和合理的召回率识别出教学质量问题,但精确率相对较低。一本现有的免费NLP情感分析词典的性能几乎与需要专家编码或人工创建的词典相当。

相似文献

本文引用的文献

4
Gender Difference in Teaching Evaluation Scores of Pediatric Faculty.儿科教师教学评价得分的性别差异。
Acad Pediatr. 2023 Apr;23(3):564-568. doi: 10.1016/j.acap.2022.07.017. Epub 2022 Jul 29.
7
Improving Medical Education Through Targeted Coaching.通过定向辅导改善医学教育。
Med Sci Educ. 2020 Jun 16;30(3):1255-1261. doi: 10.1007/s40670-020-01002-2. eCollection 2020 Sep.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验