Department of Cognitive Science, Johns Hopkins University, Baltimore, MD 21218.
Department of Psychology, University of Minnesota, Minneapolis, MN 55455.
Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2317707121. doi: 10.1073/pnas.2317707121. Epub 2024 Jun 3.
Human pose, defined as the spatial relationships between body parts, carries instrumental information supporting the understanding of motion and action of a person. A substantial body of previous work has identified cortical areas responsive to images of bodies and different body parts. However, the neural basis underlying the visual perception of body part relationships has received less attention. To broaden our understanding of body perception, we analyzed high-resolution fMRI responses to a wide range of poses from over 4,000 complex natural scenes. Using ground-truth annotations and an application of three-dimensional (3D) pose reconstruction algorithms, we compared similarity patterns of cortical activity with similarity patterns built from human pose models with different levels of depth availability and viewpoint dependency. Targeting the challenge of explaining variance in complex natural image responses with interpretable models, we achieved statistically significant correlations between pose models and cortical activity patterns (though performance levels are substantially lower than the noise ceiling). We found that the 3D view-independent pose model, compared with two-dimensional models, better captures the activation from distinct cortical areas, including the right posterior superior temporal sulcus (pSTS). These areas, together with other pose-selective regions in the LOTC, form a broader, distributed cortical network with greater view-tolerance in more anterior patches. We interpret these findings in light of the computational complexity of natural body images, the wide range of visual tasks supported by pose structures, and possible shared principles for view-invariant processing between articulated objects and ordinary, rigid objects.
人体姿势,定义为身体各部分之间的空间关系,携带有助于理解人体运动和动作的工具信息。大量先前的研究已经确定了对身体和不同身体部位的图像有反应的皮质区域。然而,对于身体部位关系的视觉感知的神经基础,人们关注较少。为了更全面地理解身体感知,我们分析了来自 4000 多个复杂自然场景的各种姿势的高分辨率 fMRI 响应。使用真实注释和三维(3D)姿势重建算法的应用,我们将皮质活动的相似性模式与具有不同深度可用性和视点依赖性的人体姿势模型的相似性模式进行了比较。针对用可解释模型解释复杂自然图像响应中的方差的挑战,我们在姿势模型和皮质活动模式之间实现了统计学上显著的相关性(尽管性能水平远低于噪声上限)。我们发现,与二维模型相比,3D 不依赖视图的姿势模型更好地捕捉到来自不同皮质区域的激活,包括右侧后颞上沟(pSTS)。这些区域与 LOTC 中的其他姿势选择性区域一起,形成了一个更广泛、分布更广泛的皮质网络,在前部区域具有更大的视图容忍度。我们根据自然身体图像的计算复杂性、姿势结构支持的广泛视觉任务以及铰接物体和普通刚性物体之间可能存在的视图不变处理的共享原则,对这些发现进行了解释。