• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大海捞针:学生对教师的评价进行自然语言处理能否识别教学问题?

Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?

作者信息

Dine C Jessica, Shea Judy A, Clancy Caitlin B, Heath Janae K, Pluta William, Kogan Jennifer R

机构信息

Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Gen Intern Med. 2025 Jan;40(1):119-123. doi: 10.1007/s11606-024-08990-6. Epub 2024 Aug 21.

DOI:10.1007/s11606-024-08990-6
PMID:39167336
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11780028/
Abstract

BACKGROUND

Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive.

AIM

To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns.

SETTING AND PARTICIPANTS

Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018.

PROGRAM DESCRIPTION

The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary.

PROGRAM EVALUATION

The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37.

DISCUSSION

NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.

摘要

背景

院校依靠学生对教学的评价(SET)来确定教学质量。对叙述性评语进行人工审核能够识别出存在教学问题的教师,但这可能耗费资源和时间。

目的

确定学习者在临床轮转时完成的SET评语的自然语言处理(NLP)能否识别出教学质量问题。

设置与参与者

对2017年7月1日至2018年6月30日期间临床轮转的SET(n = 11,850)进行单机构回顾性队列分析。

方案描述

将研究团队创建的三本NLP词典的性能与一本现成的情感词典进行比较。

方案评估

专家词典的准确率为0.90,精确率为0.62,召回率为0.50。限定词词典的准确率(0.65)和精确率(0.16)较低,但召回率(0.67)相似。文本挖掘词典的准确率为0.78,召回率为0.24。情感加限定词词典的准确率(0.86)和召回率(0.77)良好,精确率为0.37。

讨论

NLP方法能够以较高的准确率和合理的召回率识别出教学质量问题,但精确率相对较低。一本现有的免费NLP情感分析词典的性能几乎与需要专家编码或人工创建的词典相当。

相似文献

1
Finding the Needle in the Haystack: Can Natural Language Processing of Students' Evaluations of Teachers Identify Teaching Concerns?大海捞针:学生对教师的评价进行自然语言处理能否识别教学问题?
J Gen Intern Med. 2025 Jan;40(1):119-123. doi: 10.1007/s11606-024-08990-6. Epub 2024 Aug 21.
2
Natural Language Processing of Learners' Evaluations of Attendings to Identify Professionalism Lapses.自然语言处理学生对带教医生的评价,以识别职业操守失误。
Eval Health Prof. 2023 Sep;46(3):225-232. doi: 10.1177/01632787231158128. Epub 2023 Feb 24.
3
Evaluation of bias and gender/racial concordance based on sentiment analysis of narrative evaluations of clinical clerkships using natural language processing.基于自然语言处理的临床实习叙事评估的情感分析评估偏倚和性别/种族一致性。
BMC Med Educ. 2024 Mar 15;24(1):295. doi: 10.1186/s12909-024-05271-y.
4
Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods.社交媒体上的公共卫生讨论:评估自动情感分析方法
JMIR Form Res. 2025 Jan 8;9:e57395. doi: 10.2196/57395.
5
Towards teaching analytics: a contextual model for analysis of students' evaluation of teaching through text mining and machine learning classification.迈向教学分析:一种通过文本挖掘和机器学习分类来分析学生教学评价的情境模型。
Educ Inf Technol (Dordr). 2022;27(3):3891-3933. doi: 10.1007/s10639-021-10751-5. Epub 2021 Oct 11.
6
Validating dental and medical students' evaluations of faculty teaching in an integrated, multi-instructor course.验证牙科和医学专业学生对综合多教师课程中教师教学的评价。
J Dent Educ. 2005 Jun;69(6):663-70.
7
Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.利用电子心理健康记录的自然语言处理进行住院法医精神病学环境中的风险预测。
J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.
8
Assessment of Gender-Based Linguistic Differences in Physician Trainee Evaluations of Medical Faculty Using Automated Text Mining.使用自动化文本挖掘评估医师培训生对医学教师的基于性别的语言差异评估。
JAMA Netw Open. 2019 May 3;2(5):e193520. doi: 10.1001/jamanetworkopen.2019.3520.
9
Improving teaching on an inpatient pediatrics service: a retrospective analysis of a program change.改善住院儿科服务的教学:对项目变更的回顾性分析。
BMC Med Educ. 2012 Oct 1;12:92. doi: 10.1186/1472-6920-12-92.
10
Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.自动检测在线社区文本自然语言处理工具中的故障。
J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.

本文引用的文献

1
The Impact of Faculty Gender on Resident Evaluations of Faculty Performance in Emergency Medicine.教员性别对急诊医学住院医师对教员表现评估的影响
Cureus. 2024 Mar 24;16(3):e56814. doi: 10.7759/cureus.56814. eCollection 2024 Mar.
2
Evaluation of bias and gender/racial concordance based on sentiment analysis of narrative evaluations of clinical clerkships using natural language processing.基于自然语言处理的临床实习叙事评估的情感分析评估偏倚和性别/种族一致性。
BMC Med Educ. 2024 Mar 15;24(1):295. doi: 10.1186/s12909-024-05271-y.
3
Necessary but Insufficient and Possibly Counterproductive: The Complex Problem of Teaching Evaluations.必要但不充分且可能适得其反:教学评估的复杂问题
Acad Med. 2023 Mar 1;98(3):300-303. doi: 10.1097/ACM.0000000000005006. Epub 2022 Oct 4.
4
Gender Difference in Teaching Evaluation Scores of Pediatric Faculty.儿科教师教学评价得分的性别差异。
Acad Pediatr. 2023 Apr;23(3):564-568. doi: 10.1016/j.acap.2022.07.017. Epub 2022 Jul 29.
5
Bottom-up feedback to improve clinical teaching: validation of the Swiss System for Evaluation of Teaching Qualities (SwissSETQ).自下而上的反馈以改善临床教学:瑞士教学质量评估系统(SwissSETQ)的验证。
Swiss Med Wkly. 2022 Mar 18;152:w30137. doi: 10.4414/smw.2022.w30137. eCollection 2022 Mar 14.
6
Student evaluations of teaching and the development of a comprehensive measure of teaching effectiveness for medical schools.学生对教学的评价与医学院教学效果综合衡量指标的制定。
BMC Med Educ. 2022 Feb 19;22(1):113. doi: 10.1186/s12909-022-03148-6.
7
Improving Medical Education Through Targeted Coaching.通过定向辅导改善医学教育。
Med Sci Educ. 2020 Jun 16;30(3):1255-1261. doi: 10.1007/s40670-020-01002-2. eCollection 2020 Sep.
8
Gendered Expectations: the Impact of Gender, Evaluation Language, and Clinical Setting on Resident Trainee Assessment of Faculty Performance.性别期望:性别、评价语言和临床环境对住院医师评估教师表现的影响。
J Gen Intern Med. 2022 Mar;37(4):714-722. doi: 10.1007/s11606-021-07093-w. Epub 2021 Aug 17.
9
Gender bias in resident evaluations: Natural language processing and competency evaluation.住院医师评估中的性别偏见:自然语言处理与能力评估。
Med Educ. 2021 Dec;55(12):1383-1387. doi: 10.1111/medu.14593. Epub 2021 Jul 30.
10
Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies.使用自然语言处理技术自动评估反馈质量:来自 3 个外科住院医师培训项目的研究结果。
Acad Med. 2021 Oct 1;96(10):1457-1460. doi: 10.1097/ACM.0000000000004153.