White Andrew A, King Ann M, D'Addario Angelo E, Brigham Karen Berg, Dintzis Suzanne, Fay Emily E, Gallagher Thomas H, Mazor Kathleen M
Department of Medicine, University of Washington School of Medicine, Seattle, WA, United States.
National Board of Medical Examiners, Philadelphia, PA, United States.
JMIR Med Educ. 2022 Apr 29;8(2):e30988. doi: 10.2196/30988.
Residents may benefit from simulated practice with personalized feedback to prepare for high-stakes disclosure conversations with patients after harmful errors and to meet American Council on Graduate Medical Education mandates. Ideally, feedback would come from patients who have experienced communication after medical harm, but medical researchers and leaders have found it difficult to reach this community, which has made this approach impractical at scale. The Video-Based Communication Assessment app is designed to engage crowdsourced laypeople to rate physician communication skills but has not been evaluated for use with medical harm scenarios.
We aimed to compare the reliability of 2 assessment groups (crowdsourced laypeople and patient advocates) in rating physician error disclosure communication skills using the Video-Based Communication Assessment app.
Internal medicine residents used the Video-Based Communication Assessment app; the case, which consisted of 3 sequential vignettes, depicted a delayed diagnosis of breast cancer. Panels of patient advocates who have experienced harmful medical error, either personally or through a family member, and crowdsourced laypeople used a 5-point scale to rate the residents' error disclosure communication skills (6 items) based on audiorecorded responses. Ratings were aggregated across items and vignettes to create a numerical communication score for each physician. We used analysis of variance, to compare stringency, and Pearson correlation between patient advocates and laypeople, to identify whether rank order would be preserved between groups. We used generalizability theory to examine the difference in assessment reliability between patient advocates and laypeople.
Internal medicine residents (n=20) used the Video-Based Communication Assessment app. All patient advocates (n=8) and 42 of 59 crowdsourced laypeople who had been recruited provided complete, high-quality ratings. Patient advocates rated communication more stringently than crowdsourced laypeople (patient advocates: mean 3.19, SD 0.55; laypeople: mean 3.55, SD 0.40; P<.001), but patient advocates' and crowdsourced laypeople's ratings of physicians were highly correlated (r=0.82, P<.001). Reliability for 8 raters and 6 vignettes was acceptable (patient advocates: G coefficient 0.82; crowdsourced laypeople: G coefficient 0.65). Decision studies estimated that 12 crowdsourced layperson raters and 9 vignettes would yield an acceptable G coefficient of 0.75.
Crowdsourced laypeople may represent a sustainable source of reliable assessments of physician error disclosure skills. For a simulated case involving delayed diagnosis of breast cancer, laypeople correctly identified high and low performers. However, at least 12 raters and 9 vignettes are required to ensure adequate reliability and future studies are warranted. Crowdsourced laypeople rate less stringently than raters who have experienced harm. Future research should examine the value of the Video-Based Communication Assessment app for formative assessment, summative assessment, and just-in-time coaching of error disclosure communication skills.
住院医师可能会从有个性化反馈的模拟练习中受益,以便为在发生有害错误后与患者进行高风险的信息披露对话做好准备,并满足美国毕业后医学教育认证委员会的要求。理想情况下,反馈应来自经历过医疗伤害后沟通的患者,但医学研究人员和领导者发现很难接触到这个群体,这使得这种方法在大规模应用时不切实际。基于视频的沟通评估应用程序旨在让众包的外行人对医生的沟通技巧进行评分,但尚未针对医疗伤害场景的使用进行评估。
我们旨在比较两个评估组(众包外行人与患者权益倡导者)使用基于视频的沟通评估应用程序对医生错误披露沟通技巧进行评分时的可靠性。
内科住院医师使用基于视频的沟通评估应用程序;该案例由3个连续的小插曲组成,描述了乳腺癌的延迟诊断。由亲身经历过或通过家庭成员经历过有害医疗错误的患者权益倡导者小组以及众包外行人根据录音回复,使用5分制对住院医师的错误披露沟通技巧(6项)进行评分。对各项和各个小插曲的评分进行汇总,为每位医生创建一个数字沟通得分。我们使用方差分析来比较严格程度,并使用患者权益倡导者和外行人之间的皮尔逊相关性来确定两组之间的排名顺序是否会保持一致。我们使用概化理论来检验患者权益倡导者和外行人在评估可靠性方面的差异。
20名内科住院医师使用了基于视频的沟通评估应用程序。所有8名患者权益倡导者以及招募的59名众包外行人中的42人提供了完整、高质量的评分。患者权益倡导者对沟通的评分比众包外行人更严格(患者权益倡导者:均值3.19,标准差0.55;外行人:均值3.55,标准差0.40;P<0.001),但患者权益倡导者和众包外行人对医生的评分高度相关(r = 0.82,P<0.001)。8名评分者和6个小插曲的可靠性是可以接受的(患者权益倡导者:G系数0.82;众包外行人:G系数0.65)。决策研究估计,12名众包外行人评分者和9个小插曲将产生可接受的G系数0.75。
众包外行人可能是对医生错误披露技能进行可靠评估的可持续来源。对于一个涉及乳腺癌延迟诊断的模拟案例,外行人正确识别出了表现好和表现差的医生。然而,需要至少12名评分者和9个小插曲来确保足够的可靠性,未来的研究是有必要的。众包外行人的评分比经历过伤害的评分者更宽松。未来的研究应该探讨基于视频的沟通评估应用程序在错误披露沟通技能的形成性评估、总结性评估和即时辅导方面的价值。