Li Yan
School of Foreign Languages and Cultures, Jilin University, Changchun 130012, China.
Behav Sci (Basel). 2025 Apr 17;15(4):542. doi: 10.3390/bs15040542.
This study investigates how Chinese EFL (English as a foreign language) learners of low- and high-proficiency levels allocate attention between captions and audio while watching videos, and how visual complexity (single- vs. multi-speaker content) influences caption reliance. The study employed a novel paused transcription method to assess real-time processing. A total of 64 participants (31 low-proficiency [A1-A2] and 33 high-proficiency [C1-C2] learners) viewed single- and multi-speaker videos with English captions. Misleading captions were inserted to objectively measure reliance on captions versus audio. Results revealed significant proficiency effects: Low-proficiency learners prioritized captions (reading scores > listening, = -4.55, < 0.001, = 0.82), while high-proficiency learners focused on audio (listening > reading, = -5.12, < 0.001, = 0.89). Multi-speaker videos amplified caption reliance for low-proficiency learners ( = 0.75) and moderately increased reliance for high-proficiency learners ( = 0.52). These findings demonstrate that low-proficiency learners rely overwhelmingly on captions during video viewing, while high-proficiency learners integrate multimodal inputs. Notably, increased visual complexity amplifies caption reliance across proficiency levels. Implications are twofold: Pedagogically, educators could design tiered caption removal protocols as skills improve while incorporating adjustable caption opacity tools. Technologically, future research could focus on developing dynamic captioning systems leveraging eye-tracking and AI to adapt to real-time proficiency, optimizing learning experiences. Additionally, video complexity should be calibrated to learners' proficiency levels.
本研究调查了低水平和高水平的中国英语外语学习者在观看视频时如何在字幕和音频之间分配注意力,以及视觉复杂性(单说话者与多说话者内容)如何影响对字幕的依赖。该研究采用了一种新颖的暂停转录方法来评估实时处理情况。共有64名参与者(31名低水平[A1 - A2]学习者和33名高水平[C1 - C2]学习者)观看了带有英文字幕的单说话者和多说话者视频。插入了误导性字幕以客观衡量对字幕与音频的依赖。结果显示出显著的水平效应:低水平学习者优先选择字幕(阅读得分 > 听力, = -4.55, < 0.001, = 0.82),而高水平学习者专注于音频(听力 > 阅读, = -5.12, < 0.001, = 0.89)。多说话者视频增强了低水平学习者对字幕的依赖( = 0.75),并适度增加了高水平学习者的依赖( = 0.52)。这些发现表明,低水平学习者在观看视频时绝大多数依赖字幕,而高水平学习者整合多模态输入。值得注意的是.视觉复杂性增加会增强各水平学习者对字幕的依赖。其意义有两方面:在教学方面,教育工作者可以随着技能提升设计分层的字幕去除方案,同时纳入可调节字幕不透明度的工具。在技术方面,未来的研究可以专注于开发利用眼动追踪和人工智能的动态字幕系统,以适应实时水平,优化学习体验。此外,视频的复杂性应根据学习者的水平进行校准。