Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, United Kingdom.
Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom.
Med Image Anal. 2021 Apr;69:101973. doi: 10.1016/j.media.2021.101973. Epub 2021 Jan 23.
Ultrasound is a widely used imaging modality, yet it is well-known that scanning can be highly operator-dependent and difficult to perform, which limits its wider use in clinical practice. The literature on understanding what makes clinical sonography hard to learn and how sonography varies in the field is sparse, restricted to small-scale studies on the effectiveness of ultrasound training schemes, the role of ultrasound simulation in training, and the effect of introducing scanning guidelines and standards on diagnostic image quality. The Big Data era, and the recent and rapid emergence of machine learning as a more mainstream large-scale data analysis technique, presents a fresh opportunity to study sonography in the field at scale for the first time. Large-scale analysis of video recordings of full-length routine fetal ultrasound scans offers the potential to characterise differences between the scanning proficiency of experts and trainees that would be tedious and time-consuming to do manually due to the vast amounts of data. Such research would be informative to better understand operator clinical workflow when conducting ultrasound scans to support skills training, optimise scan times, and inform building better user-machine interfaces. This paper is to our knowledge the first to address sonography data science, which we consider in the context of second-trimester fetal sonography screening. Specifically, we present a fully-automatic framework to analyse operator clinical workflow solely from full-length routine second-trimester fetal ultrasound scan videos. An ultrasound video dataset containing more than 200 hours of scan recordings was generated for this study. We developed an original deep learning method to temporally segment the ultrasound video into semantically meaningful segments (the video description). The resulting semantic annotation was then used to depict operator clinical workflow (the knowledge representation). Machine learning was applied to the knowledge representation to characterise operator skills and assess operator variability. For video description, our best-performing deep spatio-temporal network shows favourable results in cross-validation (accuracy: 91.7%), statistical analysis (correlation: 0.98, p < 0.05) and retrospective manual validation (accuracy: 76.4%). For knowledge representation of operator clinical workflow, a three-level abstraction scheme consisting of a Subject-specific Timeline Model (STM), Summary of Timeline Features (STF), and an Operator Graph Model (OGM), was introduced that led to a significant decrease in dimensionality and computational complexity compared to raw video data. The workflow representations were learnt to discriminate between operator skills, where a proposed convolutional neural network-based model showed most promising performance (cross-validation accuracy: 98.5%, accuracy on unseen operators: 76.9%). These were further used to derive operator-specific scanning signatures and operator variability in terms of type, order and time distribution of constituent tasks.
超声是一种广泛应用的成像方式,但众所周知,扫描过程高度依赖操作者,并且难以进行,这限制了其在临床实践中的更广泛应用。关于理解什么使临床超声检查难以学习以及超声检查在现场的差异的文献很少,仅限于关于超声培训计划效果的小规模研究、超声模拟在培训中的作用以及引入扫描指南和标准对诊断图像质量的影响。大数据时代,以及机器学习作为一种更主流的大规模数据分析技术的迅速出现,为首次大规模研究现场超声检查提供了新的机会。对全长常规胎儿超声扫描的视频记录进行大规模分析,有可能对专家和学员的扫描熟练程度差异进行特征描述,而手动进行这种描述由于数据量巨大而非常繁琐和耗时。这种研究对于更好地了解操作人员进行超声扫描时的临床工作流程、支持技能培训、优化扫描时间以及为构建更好的人机界面提供信息将是有意义的。本文是我们所知的第一篇关于超声数据科学的论文,我们将其置于胎儿超声筛查的背景下进行考虑。具体来说,我们提出了一种完全自动化的框架,仅从全长常规胎儿超声扫描视频中分析操作人员的临床工作流程。为此研究生成了一个包含 200 多个小时扫描记录的超声视频数据集。我们开发了一种原始的深度学习方法,将超声视频按语义分成有意义的片段(视频描述)。然后,使用所得的语义注释来描述操作人员的临床工作流程(知识表示)。机器学习应用于知识表示,以描述操作人员的技能并评估操作人员的变异性。对于视频描述,我们表现最好的深度时空网络在交叉验证中(准确性:91.7%)、统计分析(相关性:0.98,p<0.05)和回顾性手动验证(准确性:76.4%)中都取得了良好的结果。对于操作人员临床工作流程的知识表示,引入了一个由Subject-specific Timeline Model (STM)、Timeline Features Summary (STF) 和 Operator Graph Model (OGM) 组成的三级抽象方案,与原始视频数据相比,该方案大大降低了维度和计算复杂性。对工作流程表示进行了学习,以区分操作人员的技能,其中提出的基于卷积神经网络的模型表现出最有希望的性能(交叉验证准确性:98.5%,对未见操作人员的准确性:76.9%)。这些进一步用于得出操作人员特定的扫描特征以及操作人员在组成任务的类型、顺序和时间分布方面的变异性。