大海捞针：学生对教师的评价进行自然语言处理能否识别教学问题？

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

作者信息

Dine C Jessica, Shea Judy A, Clancy Caitlin B, Heath Janae K, Pluta William, Kogan Jennifer R

机构信息

Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Gen Intern Med. 2025 Jan;40(1):119-123. doi: 10.1007/s11606-024-08990-6. Epub 2024 Aug 21.

DOI:10.1007/s11606-024-08990-6

PMID:39167336

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11780028/

Abstract

BACKGROUND

Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive.

AIM

To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns.

SETTING AND PARTICIPANTS

Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018.

PROGRAM DESCRIPTION

The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary.

PROGRAM EVALUATION

The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37.

DISCUSSION

NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.

摘要

背景

院校依靠学生对教学的评价（SET）来确定教学质量。对叙述性评语进行人工审核能够识别出存在教学问题的教师，但这可能耗费资源和时间。

目的

确定学习者在临床轮转时完成的SET评语的自然语言处理（NLP）能否识别出教学质量问题。

设置与参与者

对2017年7月1日至2018年6月30日期间临床轮转的SET（n = 11,850）进行单机构回顾性队列分析。

方案描述

将研究团队创建的三本NLP词典的性能与一本现成的情感词典进行比较。

方案评估

专家词典的准确率为0.90，精确率为0.62，召回率为0.50。限定词词典的准确率（0.65）和精确率（0.16）较低，但召回率（0.67）相似。文本挖掘词典的准确率为0.78，召回率为0.24。情感加限定词词典的准确率（0.86）和召回率（0.77）良好，精确率为0.37。

讨论

NLP方法能够以较高的准确率和合理的召回率识别出教学质量问题，但精确率相对较低。一本现有的免费NLP情感分析词典的性能几乎与需要专家编码或人工创建的词典相当。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大海捞针：学生对教师的评价进行自然语言处理能否识别教学问题？

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

作者信息

机构信息

出版信息

BACKGROUND

AIM

SETTING AND PARTICIPANTS

PROGRAM DESCRIPTION

PROGRAM EVALUATION

DISCUSSION

背景

目的

设置与参与者

方案描述

方案评估

讨论

相似文献

本文引用的文献

大海捞针：学生对教师的评价进行自然语言处理能否识别教学问题？

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

作者信息

机构信息

出版信息

BACKGROUND

AIM

SETTING AND PARTICIPANTS

PROGRAM DESCRIPTION

PROGRAM EVALUATION

DISCUSSION

背景

目的

设置与参与者

方案描述

方案评估

讨论

相似文献

本文引用的文献