Harezlak Jaroslaw, Coull Brent A, Laird Nan M, Magari Shannon R, Christiani David C
Department of Biostatistics, Harvard School of Public Health, 655 Huntington Ave. Boston, MA 02115 USA.
Comput Stat Data Anal. 2007 Jun 15;51(10):4911-4925. doi: 10.1016/j.csda.2006.09.034.
Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the bases function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L(2)-norm of the basis functions coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on the heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of particulate matter corresponding to smoking breaks.
近期在连续生物监测和个人暴露评估方面的技术进步已促成特定对象功能数据的收集。此类研究的一个主要目标是评估功能预测因子与功能反应之间的关系。历史功能线性模型(HFLM)可用于对反应依赖预测值历史的这种依存关系进行建模。提出了一种使用多种正则化技术的回归系数估计程序。通过有限维基展开来近似预测因子与结果之间的回归曲面,然后通过将系数差异的大小限制为较小来对相邻基函数的系数进行惩罚。研究了基于基函数系数差异绝对值(对应于套索回归)及其平方(对应于惩罚样条方法)的惩罚。使用赤池信息准则的扩展来比较拟合结果,该扩展结合了误差方差估计、拟合的自由度和基函数系数的范数。通过模拟评估所提出方法的性能。应用于线性变换系数的套索惩罚产生估计回归曲面的更稀疏表示,而二次惩罚提供具有最小基函数系数L(2)范数的解。最后,将新的估计程序应用于分析一组锅炉制造工人中职业颗粒物(PM)暴露对心率变异性(HRV)的影响。结果表明,这些工人中PM暴露与HRV之间最强的关联是由于对应于吸烟休息期间颗粒物水平升高的点暴露所致。