Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA.
Independent Scholar, Hamilton, Canada.
Med Educ. 2020 Dec;54(12):1159-1170. doi: 10.1111/medu.14347. Epub 2020 Sep 10.
Observed Structured Clinical Exams (OSCEs) allow assessment of, and provide feedback to, medical students. Clinical examiners and standardised patients (SP) typically complete itemised checklists and global scoring scales, which have known shortcomings. In this study, we applied machine learning (ML) to label some communication skills and interview content information in OSCE transcripts and to compare several ML methodologies by performance and transferability.
One-hundred and twenty-one transcripts of two OSCE scenarios were manually annotated per utterance across 19 communication skills and content areas. Utterances were converted to two types of numeric sentence vector representations and were paired with three types of ML algorithms. First, ML models (MLMs) were evaluated using a five K-fold cross-validation technique on all transcripts in one scenario to generate precision and recall, and their harmonic mean, F1 scores. Second, ML models were trained on all 101 transcripts from scenario 1 and tested for transferability on 20 scenario 2 transcripts.
Performance testing in the K-fold cross-validation demonstrated relatively high mean F1 scores: median 0.87 and range 0.53-0.98 across all 19 labels. Transferability testing demonstrated success: F1 median 0.76 and range 0.46-0.97. The combination of a bi-directional long short-term memory neural network (biLSTM) algorithm with GenSen numeric sentence vector representations was associated with greater F1 scores across both performance and transferability (P < .005).
We report the first application of ML in the context of student-SP OSCEs. We demonstrated that several MLMs automatically labelled OSCE transcripts for a range of interview content and some clinical communications skills. Some MLMs achieved greater performance and transferability. Optimised MLMs could provide automated and accurate assessment of OSCEs with potential to track student progress and identify areas for further practice.
观察性结构化临床考试(OSCE)允许评估医学生,并为其提供反馈。临床考官和标准化患者(SP)通常会完成项目清单检查表和全球评分量表,但这些方法存在已知的缺陷。在这项研究中,我们应用机器学习(ML)对 OSCE 记录中的一些沟通技巧和访谈内容信息进行标注,并通过性能和可转移性比较几种 ML 方法。
对两个 OSCE 场景中的 121 个记录,每个记录都按话语进行手动标注,共涉及 19 个沟通技巧和内容领域。话语被转换为两种类型的数字句子向量表示形式,并与三种类型的 ML 算法配对。首先,使用 5 次 K 折交叉验证技术在一个场景中的所有记录上评估 ML 模型(MLM),以生成精度、召回率和调和均值 F1 分数。其次,在场景 1 的 101 个记录上训练 ML 模型,并在场景 2 的 20 个记录上进行可转移性测试。
K 折交叉验证中的性能测试显示出相对较高的平均 F1 分数:19 个标签的中位数为 0.87,范围为 0.53-0.98。可转移性测试显示出成功:F1 的中位数为 0.76,范围为 0.46-0.97。双向长短期记忆神经网络(biLSTM)算法与 GenSen 数字句子向量表示的组合与更高的 F1 分数相关,无论是在性能还是可转移性方面(P<.005)。
我们报告了机器学习在学生-SP OSCE 中的首次应用。我们证明了几种 MLM 可以自动标注 OSCE 记录中的一系列访谈内容和一些临床沟通技巧。一些 MLM 达到了更高的性能和可转移性。优化的 MLM 可以提供 OSCE 的自动化和准确评估,有可能跟踪学生的进步并确定进一步实践的领域。