University of California, San Francisco School of Medicine, San Francisco, CA, USA.
Division of Hospital Medicine, University of California, San Francisco, School of Medicine, San Francisco, CA, USA.
J Gen Intern Med. 2019 May;34(5):684-691. doi: 10.1007/s11606-019-04889-9.
In varied educational settings, narrative evaluations have revealed systematic and deleterious differences in language describing women and those underrepresented in their fields. In medicine, limited qualitative studies show differences in narrative language by gender and under-represented minority (URM) status.
To identify and enumerate text descriptors in a database of medical student evaluations using natural language processing, and identify differences by gender and URM status in descriptions.
An observational study of core clerkship evaluations of third-year medical students, including data on student gender, URM status, clerkship grade, and specialty.
A total of 87,922 clerkship evaluations from core clinical rotations at two medical schools in different geographic areas.
We employed natural language processing to identify differences in the text of evaluations for women compared to men and for URM compared to non-URM students.
We found that of the ten most common words, such as "energetic" and "dependable," none differed by gender or URM status. Of the 37 words that differed by gender, 62% represented personal attributes, such as "lovely" appearing more frequently in evaluations of women (p < 0.001), while 19% represented competency-related behaviors, such as "scientific" appearing more frequently in evaluations of men (p < 0.001). Of the 53 words that differed by URM status, 30% represented personal attributes, such as "pleasant" appearing more frequently in evaluations of URM students (p < 0.001), and 28% represented competency-related behaviors, such as "knowledgeable" appearing more frequently in evaluations of non-URM students (p < 0.001).
Many words and phrases reflected students' personal attributes rather than competency-related behaviors, suggesting a gap in implementing competency-based evaluation of students. We observed a significant difference in narrative evaluations associated with gender and URM status, even among students receiving the same grade. This finding raises concern for implicit bias in narrative evaluation, consistent with prior studies, and suggests opportunities for improvement.
在各种教育环境中,叙事评估揭示了在描述女性和其所在领域代表性不足的人时,语言存在系统且有害的差异。在医学领域,有限的定性研究表明,性别和代表性不足的少数族裔(URM)地位的叙事语言存在差异。
使用自然语言处理技术,在医学生评估数据库中识别和列举文本描述符,并确定性别和 URM 身份在描述中的差异。
对两所地理位置不同的医学院的三年级医学生核心实习评估进行的观察性研究,包括学生性别、URM 身份、实习成绩和专业等数据。
共 87922 份来自两个医学院核心临床轮转的实习评估。
我们采用自然语言处理技术,比较女性与男性、URM 与非 URM 学生评估文本之间的差异。
我们发现,在最常见的十个词中,如“energetic”和“dependable”,没有一个词因性别或 URM 身份而不同。在因性别而不同的 37 个词中,62%代表个人属性,如“lovely”在女性评估中更频繁出现(p<0.001),而 19%代表与能力相关的行为,如“scientific”在男性评估中更频繁出现(p<0.001)。在因 URM 身份而不同的 53 个词中,30%代表个人属性,如“pleasant”在 URM 学生的评估中更频繁出现(p<0.001),28%代表与能力相关的行为,如“knowledgeable”在非 URM 学生的评估中更频繁出现(p<0.001)。
许多词语和短语反映了学生的个人属性,而不是与能力相关的行为,这表明在对学生进行基于能力的评估方面存在差距。我们观察到与性别和 URM 身份相关的叙事评估存在显著差异,即使是在获得相同成绩的学生中也是如此。这一发现引起了对叙事评估中隐含偏见的关注,与先前的研究一致,并表明有改进的机会。