Bialas Ole, Lalor Edmund C
Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America.
Del Monte Institute for Neuroscience , University of Rochester, Rochester, New York, United States of America.
PLoS One. 2025 May 23;20(5):e0323276. doi: 10.1371/journal.pone.0323276. eCollection 2025.
In recent decades, studies modeling the neural processing of continuous, naturalistic, speech provided new insights into how speech and language are represented in the brain. However, the linear encoder models commonly used in such studies assume that the underlying data are stationary, varying to a fixed degree around a constant mean. Long, continuous, neural recordings may violate this assumption leading to impaired model performance. We aimed to examine the effect of non-stationary trends in continuous neural recordings on the performance of linear speech encoding models.
We used temporal response functions (TRFs) to predict continuous neural responses to speech while splitting the data into segments of varying length, prior to model fitting. Our Hypothesis was that if the data were non-stationary, segmentation should improve model performance by making individual segments approximately stationary. We simulated and predicted stationary and non-stationary recordings to test our hypothesis under a known ground truth and predicted the brain activity of participants who listened to a narrated story, to test our hypothesis on actual neural recordings.
Simulations showed that, for stationary data, increasing segmentation steadily decreased model performance. For non-stationary data however, segmentation initially improved model performance. Modeling of neural recordings yielded similar results: segments of intermediate length (5-15 s) led to improved model performance compared to very short (1-2 s) and very long (30-120 s) segments.
We showed that data segmentation improves the performance of encoding models for both simulated and real neural data and that this can be explained by the fact that shorter segments approximate stationarity more closely. Thus, the common practice of applying encoding models to long continuous segments of data is suboptimal and recordings should be segmented prior to modeling.
近几十年来,对连续、自然语音的神经处理进行建模的研究为语音和语言在大脑中的表征方式提供了新的见解。然而,此类研究中常用的线性编码器模型假设基础数据是平稳的,即在恒定均值周围以固定程度变化。长时间的连续神经记录可能会违反这一假设,导致模型性能受损。我们旨在研究连续神经记录中的非平稳趋势对线性语音编码模型性能的影响。
在进行模型拟合之前,我们使用时间响应函数(TRF)来预测对语音的连续神经反应,同时将数据分割成不同长度的片段。我们的假设是,如果数据是非平稳的,分割应通过使各个片段近似平稳来提高模型性能。我们模拟并预测了平稳和非平稳记录,以在已知的真实情况下检验我们的假设,并预测了听叙述故事的参与者的大脑活动,以在实际神经记录上检验我们的假设。
模拟表明,对于平稳数据,增加分割会稳步降低模型性能。然而,对于非平稳数据,分割最初会提高模型性能。神经记录的建模产生了类似的结果:与非常短(1 - 2秒)和非常长(30 - 120秒)的片段相比,中等长度(5 - 15秒)的片段导致模型性能提高。
我们表明,数据分割提高了模拟和真实神经数据编码模型的性能,这可以通过较短片段更接近平稳性这一事实来解释。因此,将编码模型应用于长连续数据段的常见做法是次优的,记录应在建模之前进行分割。