不规则采样和相关功能数据的混合建模：语音科学应用

Mixed modeling for irregularly sampled and correlated functional data: Speech science applications.

作者信息

Pouplier Marianne, Cederbaum Jona, Hoole Philip, Marin Stefania, Greven Sonja

机构信息

Institute of Phonetics and Speech Processing, Ludwig Maximilians University, Munich, Germany.

Department of Statistics, Ludwig Maximilians University, Munich, Germany.

出版信息

J Acoust Soc Am. 2017 Aug;142(2):935. doi: 10.1121/1.4998555.

DOI:10.1121/1.4998555

PMID:28863567

Abstract

The speech sciences often employ complex experimental designs requiring models with multiple covariates and crossed random effects. For curve-like data such as time-varying signals, single-time-point feature extraction is commonly used as data reduction technique to make the data amenable to statistical hypothesis testing, thereby discarding a wealth of information. The present paper discusses the application of functional linear mixed models, a functional analogue to linear mixed models. This type of model allows for the holistic evaluation of curve dynamics for data with complex correlation structures due to repeated measures on subjects and stimulus items. The nonparametric, spline-based estimation technique allows for correlated functional data to be observed irregularly, or even sparsely. This means that information on variation in the temporal domain is preserved. Functional principal component analysis is used for parsimonious data representation and variance decomposition. The basic functionality and usage of the model is illustrated based on several case studies with different data types and experimental designs. The statistical method is broadly applicable to any types of data that consist of groups of curves, whether they are articulatory or acoustic time series data, or generally any types of data suitably modeled based on penalized splines.

摘要

言语科学经常采用复杂的实验设计，这种设计需要具有多个协变量和交叉随机效应的模型。对于诸如随时间变化的信号这样的曲线状数据，单时间点特征提取通常被用作数据简化技术，以使数据适合进行统计假设检验，从而丢弃大量信息。本文讨论了函数线性混合模型的应用，它是线性混合模型的函数类似物。由于对受试者和刺激项目的重复测量，这种类型的模型允许对具有复杂相关结构的数据的曲线动态进行整体评估。基于样条的非参数估计技术允许对相关的函数数据进行不规则甚至稀疏的观测。这意味着时域变化的信息得以保留。函数主成分分析用于简洁的数据表示和方差分解。基于几个具有不同数据类型和实验设计的案例研究，说明了该模型的基本功能和用法。该统计方法广泛适用于任何由曲线组组成的数据类型，无论是发音还是声学时间序列数据，或者一般来说，任何基于惩罚样条进行适当建模的数据类型。