Department of Radiology, Director of Global Health, University of Washington, 1959 N.E. Pacific Street, Box 357115, Seattle, WA, 98195, USA.
Department of Epidemiology, University of Washington, Seattle, WA, USA.
BMC Med Educ. 2024 Sep 5;24(1):969. doi: 10.1186/s12909-024-05899-w.
Diagnostic radiology residents in low- and middle-income countries (LMICs) may have to provide significant contributions to the clinical workload before the completion of their residency training. Because of time constraints inherent to the delivery of acute care, some of the most clinically impactful diagnostic radiology errors arise from the use of Computed Tomography (CT) in the management of acutely ill patients. As a result, it is paramount to ensure that radiology trainees reach adequate skill levels prior to assuming independent on-call responsibilities. We partnered with the radiology residency program at the Aga Khan University Hospital in Nairobi (Kenya) to evaluate a novel cloud-based testing method that provides an authentic radiology viewing and interpretation environment. It is based on Lifetrack, a unique Google Chrome-based Picture Archiving and Communication System, that enables a complete viewing environment for any scan, and provides a novel report generation tool based on Active Templates which are a patented structured reporting method. We applied it to evaluate the skills of AKUHN trainees on entire CT scans representing the spectrum of acute non-trauma abdominal pathology encountered in a typical on-call setting. We aimed to demonstrate the feasibility of remotely testing the authentic practice of radiology and to show that important observations can be made from such a Lifetrack-based testing approach regarding the radiology skills of an individual practitioner or of a cohort of trainees.
A total of 13 anonymized trainees with experience from 12 months to over 4 years took part in the study. Individually accessing the Lifetrack tool they were tested on 37 abdominal CT scans (including one normal scan) over six 2-hour sessions on consecutive days. All cases carried the same clinical history of acute abdominal pain. During each session the trainees accessed the corresponding Lifetrack test set using clinical workstations, reviewed the CT scans, and formulated an opinion for the acute diagnosis, any secondary pathology, and incidental findings on the scan. Their scan interpretations were composed using the Lifetrack report generation system based on active templates in which segments of text can be selected to assemble a detailed report. All reports generated by the trainees were scored on four different interpretive components: (a) acute diagnosis, (b) unrelated secondary diagnosis, (c) number of missed incidental findings, and (d) number of overcalls. A 3-score aggregate was defined from the first three interpretive elements. A cumulative score modified the 3-score aggregate for the negative effect of interpretive overcalls.
A total of 436 scan interpretations and scores were available from 13 trainees tested on 37 cases. The acute diagnosis score ranged from 0 to 1 with a mean of 0.68 ± 0.36 and median of 0.78 (IQR: 0.5-1), and there were 436 scores. An unrelated secondary diagnosis was present in 11 cases, resulting in 130 secondary diagnosis scores. The unrelated secondary diagnosis score ranged from 0 to 1, with mean score of 0.48 ± 0.46 and median of 0.5 (IQR: 0-1). There were 32 cases with incidental findings, yielding 390 scores for incidental findings. The number of missed incidental findings ranged from 0 to 5 with a median at 1 (IQR: 1-2). The incidental findings score ranged from 0 to 1 with a mean of 0.4 ± 0.38 and median of 0.33 (IQR: 0- 0.66). The number of overcalls ranged from 0 to 3 with a median at 0 (IQR: 0-1) and a mean of 0.36 ± 0.63. The 3-score aggregate ranged from 0 to 100 with a mean of 65.5 ± 32.5 and median of 77.3 (IQR: 45.0, 92.5). The cumulative score ranged from - 30 to 100 with a mean of 61.9 ± 35.5 and median of 71.4 (IQR: 37.4, 92.0). The mean acute diagnosis scores and SD by training period were 0.62 ± 0.03, 0.80 ± 0.05, 0.71 ± 0.05, 0.58 ± 0.07, and 0.66 ± 0.05 for trainees with ≤ 12 months, 12-24 months, 24-36 months, 36-48 months and > 48 months respectively. The mean acute diagnosis score of 12-24 months training was the only statistically significant greater score when compared to ≤ 12 months by the ANOVA with Tukey testing (p = 0.0002). We found a similar trend with distribution of 3-score aggregates and cumulative scores. There were no significant associations when the training period was categorized as less than and more than 2 years. We looked at the distribution of the 3-score aggregate versus the number of overcalls by trainee, and we found that the 3-score aggregate was inversely related to the number of overcalls. Heatmaps and raincloud plots provided an illustrative means to visualize the relative performance of trainees across cases.
We demonstrated the feasibility of remotely testing the authentic practice of radiology and showed that important observations can be made from our Lifetrack-based testing approach regarding radiology skills of an individual or a cohort. From observed weaknesses areas for targeted teaching can be implemented, and retesting could reveal their impact. This methodology can be customized to different LMIC environments and expanded to board certification examinations.
在中低收入国家(LMICs),诊断放射科住院医师在完成住院医师培训之前,可能需要为临床工作量做出重大贡献。由于提供急性护理的固有时间限制,一些最具临床影响的放射诊断错误源于在急性病患者的管理中使用计算机断层扫描(CT)。因此,确保放射科受训人员在承担独立值班责任之前达到足够的技能水平至关重要。我们与内罗毕(肯尼亚)阿迦汗大学医院的放射科住院医师培训计划合作,评估一种新的基于云的测试方法,该方法提供了真实的放射学观察和解释环境。它基于 Lifetrack,这是一种独特的基于 Google Chrome 的影像归档与通信系统,可为任何扫描提供完整的查看环境,并提供基于主动模板的新型报告生成工具,主动模板是一种专利的结构化报告方法。我们应用它来评估 AKUHN 受训人员在典型值班环境中遇到的急性非创伤性腹部病理的整个 CT 扫描的技能。我们旨在证明远程测试放射学真实实践的可行性,并表明从基于 Lifetrack 的测试方法中可以对个体从业者或一组受训人员的放射学技能做出重要观察。
共有 13 名具有 12 个月至 4 年以上经验的匿名受训人员参加了这项研究。他们在连续 6 天的 2 小时课程中,分别通过 37 次腹部 CT 扫描(包括一次正常扫描)接受 Lifetrack 工具测试。所有病例均有急性腹痛的相同临床病史。在每次会议中,受训者都使用临床工作站访问相应的 Lifetrack 测试集,审查 CT 扫描,并对急性诊断、任何次要病理学和扫描中的偶然发现做出诊断。他们的扫描解释是使用 Lifetrack 报告生成系统根据主动模板生成的,其中可以选择片段文本来组合详细报告。由受训者生成的所有报告都根据四个不同的解释成分进行评分:(a)急性诊断,(b)无关的次要诊断,(c)偶然发现的遗漏数量,(d)过度调用的数量。前三个解释元素定义了 3 分综合评分。累积评分通过对解释过度调用的负效应对 3 分综合评分进行修改。
从 37 个病例中对 13 名接受测试的受训者进行了总共 436 次扫描解释和评分。急性诊断评分范围为 0 到 1,平均值为 0.68±0.36,中位数为 0.78(IQR:0.5-1),有 436 个评分。11 例存在无关的次要诊断,产生 130 个次要诊断评分。无关的次要诊断评分范围为 0 到 1,平均得分为 0.48±0.46,中位数为 0.5(IQR:0-1)。32 例有偶然发现,产生 390 个偶然发现评分。遗漏的偶然发现数量中位数为 1(IQR:1-2),范围为 0 到 5。偶然发现评分范围为 0 到 1,平均得分为 0.4±0.38,中位数为 0.33(IQR:0-0.66)。过度调用的数量中位数为 0(IQR:0-1),平均值为 0.36±0.63。3 分综合评分范围为 0 到 100,平均得分为 65.5±32.5,中位数为 77.3(IQR:45.0,92.5)。累积评分范围为-30 到 100,平均得分为 61.9±35.5,中位数为 71.4(IQR:37.4,92.0)。根据培训期划分,≤12 个月、12-24 个月、24-36 个月、36-48 个月和>48 个月的受训者的平均急性诊断评分和标准差分别为 0.62±0.03、0.80±0.05、0.71±0.05、0.58±0.07 和 0.66±0.05。12-24 个月培训的平均急性诊断评分是唯一一项与≤12 个月相比具有统计学意义的更大评分,这通过 ANOVA 与 Tukey 测试比较(p=0.0002)。我们发现,3 分综合评分和累积评分的分布也存在类似的趋势。当培训期分为少于和多于 2 年时,没有显著关联。我们观察了每个受训者的 3 分综合评分与过度调用数量之间的分布,发现 3 分综合评分与过度调用数量呈负相关。热图和雨云图提供了一种直观的方法,可以直观地观察到受训者在病例中的相对表现。
我们证明了远程测试放射学真实实践的可行性,并表明从我们基于 Lifetrack 的测试方法中可以对个体或一组受训人员的放射学技能做出重要观察。可以针对观察到的弱点领域实施有针对性的教学,并且重新测试可以揭示其影响。这种方法可以针对不同的中低收入国家环境进行定制,并扩展到董事会认证考试。