Department of Educational Sciences at Ghent University, Belgium.
Language and Translation Technology Team at Ghent University, Belgium.
Perspect Med Educ. 2023 Dec 18;12(1):540-549. doi: 10.5334/pme.1056. eCollection 2023.
Manually analysing the quality of large amounts of written feedback comments is time-consuming and demands extensive resources and human effort. Therefore, this study aimed to explore whether a state-of-the-art large language model (LLM) could be fine-tuned to identify the presence of four literature-derived feedback quality criteria ( and ) and the seven CanMEDS roles ( and ) in written feedback comments.
A set of 2,349 labelled feedback comments of five healthcare educational programs in Flanders (Belgium) (specialistic medicine, general practice, midwifery, speech therapy and occupational therapy) was split into 12,452 sentences to create two datasets for the machine learning analysis. The Dutch BERT models BERTje and RobBERT were used to train four multiclass-multilabel classification models: two to identify the four feedback quality criteria and two to identify the seven CanMEDS roles.
The classification models trained with BERTje and RobBERT to predict the presence of the four feedback quality criteria attained macro average F1-scores of 0.73 and 0.76, respectively. The F1-score of the model predicting the presence of the CanMEDS roles trained with BERTje was 0.71 and 0.72 with RobBERT.
The results showed that a state-of-the-art LLM is able to identify the presence of the four feedback quality criteria and the CanMEDS roles in written feedback comments. This implies that the quality analysis of written feedback comments can be automated using an LLM, leading to savings of time and resources.
手动分析大量书面反馈意见的质量既费时又费力,需要大量的资源和人力。因此,本研究旨在探讨是否可以对最先进的大型语言模型(LLM)进行微调,以识别四种文献衍生的反馈质量标准(和)和七种 CanMEDS 角色(和)在书面反馈意见中。
一组来自比利时佛兰德斯(Flanders)的五个医疗保健教育项目的 2349 条带标签的反馈意见(专业医学、全科医学、助产学、言语治疗和职业治疗)被分成 12452 个句子,以创建两个机器学习分析数据集。使用荷兰的 BERT 模型 BERTje 和 RobBERT 来训练四个多类多标签分类模型:两个用于识别四个反馈质量标准,两个用于识别七个 CanMEDS 角色。
使用 BERTje 和 RobBERT 训练的分类模型来预测四个反馈质量标准的存在,其宏平均 F1 得分分别为 0.73 和 0.76。使用 BERTje 训练的预测 CanMEDS 角色存在的模型的 F1 得分为 0.71,使用 RobBERT 的 F1 得分为 0.72。
结果表明,最先进的 LLM 能够识别书面反馈意见中四个反馈质量标准和 CanMEDS 角色的存在。这意味着可以使用 LLM 自动分析书面反馈意见的质量,从而节省时间和资源。