Global Center for Medical Engineering and Informatics, Osaka University, Osaka, Japan.
Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan.
J Med Internet Res. 2021 May 10;23(5):e25218. doi: 10.2196/25218.
The study of doctor-patient-computer interactions is a key research area for examining doctor-patient relationships; however, studying these interactions is costly and obtrusive as researchers usually set up complex mechanisms or intrude on consultations to collect, then manually analyze the data.
We aimed to facilitate human-computer and human-human interaction research in clinics by providing a computational ethnography tool: an unobtrusive automatic classifier of screen gaze and dialogue combinations in doctor-patient-computer interactions.
The classifier's input is video taken by doctors using their computers' internal camera and microphone. By estimating the key points of the doctor's face and the presence of voice activity, we estimate the type of interaction that is taking place. The classification output of each video segment is 1 of 4 interaction classes: (1) screen gaze and dialogue, wherein the doctor is gazing at the computer screen while conversing with the patient; (2) dialogue, wherein the doctor is gazing away from the computer screen while conversing with the patient; (3) screen gaze, wherein the doctor is gazing at the computer screen without conversing with the patient; and (4) other, wherein no screen gaze or dialogue are detected. We evaluated the classifier using 30 minutes of video provided by 5 doctors simulating consultations in their clinics both in semi- and fully inclusive layouts.
The classifier achieved an overall accuracy of 0.83, a performance similar to that of a human coder. Similar to the human coder, the classifier was more accurate in fully inclusive layouts than in semi-inclusive layouts.
The proposed classifier can be used by researchers, care providers, designers, medical educators, and others who are interested in exploring and answering questions related to screen gaze and dialogue in doctor-patient-computer interactions.
研究医患计算机交互是考察医患关系的一个关键研究领域;然而,研究这些交互作用既昂贵又具有侵入性,因为研究人员通常需要设置复杂的机制或干扰咨询以收集数据,然后手动分析数据。
通过提供一种计算民族志工具,为临床中的人机和人际交互研究提供便利:一种在医患计算机交互中自动分类屏幕凝视和对话组合的不引人注目的分类器。
分类器的输入是医生使用其计算机内部摄像头和麦克风拍摄的视频。通过估计医生面部的关键点和语音活动的存在,我们估计正在进行的交互类型。每个视频片段的分类输出为 4 种交互类型之一:(1)屏幕凝视和对话,即医生在与患者交谈时凝视计算机屏幕;(2)对话,即医生在与患者交谈时凝视远离计算机屏幕;(3)屏幕凝视,即医生在与患者交谈时凝视计算机屏幕;(4)其他,即未检测到屏幕凝视或对话。我们使用 5 位医生在其诊所中模拟咨询的 30 分钟视频来评估分类器,这些视频分别采用半包含和全包含布局。
分类器的整体准确率为 0.83,与人工编码器的性能相似。与人工编码器类似,分类器在全包含布局中的准确性高于半包含布局。
所提出的分类器可被研究人员、护理提供者、设计师、医学教育者和其他对探索和回答与医患计算机交互中的屏幕凝视和对话相关问题感兴趣的人使用。