IEEE Trans Biomed Eng. 2024 Jan;71(1):237-246. doi: 10.1109/TBME.2023.3296489. Epub 2023 Dec 22.
Autism Spectrum Disorders (ASD) are characterized by impairments in joint attention (JA) comprising two components: responding to JA (RJA) and initiating JA (IJA). RJA and IJA are considered two interrelated aspects of JA, related to different stages of infant development. While recent technologies have been used to characterize RJA emerging in earlier childhood, only a limited number of studies have attempted to explore IJA, which progressively becomes evident as a hallmark of ASD. This study aims to achieve the social recognition of both RJA and IJA by vision-based human behavior perception through a multi-modal framework automatically and comprehensively.
The first three layers of this framework leverage localization, feature extraction, and activity recognition. On this basis, three critical activities in JA are recognized: attention estimation, spontaneous pointing, and showing actions. Then different behaviors are linked through the fourth layer, semantic interpretation, to model the JA event. The proposed framework is evaluated on experiments of four groups: 7 children with ASD, 5 children with mental retardation (MR), 5 children with developmental language disorder (DLD), and 3 typically developed children (TD).
Experimental results compared with human codings demonstrate recognition reliability with an intra-class coefficient of 0.959. In addition, statistical analysis suggests significant group difference and correlations.
The multi-modal human behavior perception-based framework is a feasible solution for the recognition of joint attention in unconstrained environments.
Thus the proposed approach has the potential to improve the clinical diagnosis of autism by offering quantitative monitoring and statistical analysis.
自闭症谱系障碍(ASD)的特征是共同注意力(JA)受损,包括两个组成部分:对 JA 的反应(RJA)和发起 JA(IJA)。RJA 和 IJA 被认为是 JA 的两个相互关联的方面,与婴儿发育的不同阶段有关。虽然最近的技术已被用于描述早期儿童出现的 RJA,但只有少数研究试图探索 IJA,这是 ASD 的一个显著特征。本研究旨在通过基于视觉的人类行为感知,通过一个多模态框架自动而全面地实现对 RJA 和 IJA 的社会认知。
该框架的前三层利用定位、特征提取和活动识别。在此基础上,识别 JA 中的三个关键活动:注意力估计、自发指向和展示动作。然后,通过第四层不同行为的链接,语义解释,来构建 JA 事件模型。该框架在四个组别的实验中进行了评估:7 名 ASD 儿童、5 名智力障碍(MR)儿童、5 名发育性语言障碍(DLD)儿童和 3 名正常发育儿童(TD)。
与人工编码的实验结果比较表明,识别可靠性的组内系数为 0.959。此外,统计分析表明存在显著的组间差异和相关性。
基于多模态人类行为感知的框架是一种在非约束环境中识别共同注意力的可行方法。
因此,该方法有可能通过提供定量监测和统计分析来改善自闭症的临床诊断。