Lampa Erik, Lind Lars, Lind P Monica, Bornefalk-Hermansson Anna
Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, 75185 Uppsala Sweden.
Environ Health. 2014 Jul 4;13:57. doi: 10.1186/1476-069X-13-57.
There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees.
We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method's ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data.
The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set.
We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies.
有必要评估对人类健康的复杂交互作用,例如由环境污染物混合物所引发的那些作用。通常的方法是构建一个加法统计模型,并使用感兴趣变量之间的乘积项来检验是否存在偏离。在本文中,我们提出一种使用提升回归树来搜索多个变量之间交互作用的方法。
我们从关于27种环境污染物的真实数据模拟出一个连续结果,其中一些污染物是相关的,并测试该方法揭示模拟交互作用的能力。模拟结果包含一个四向交互作用、一个非线性效应以及一个连续变量和一个二元变量之间的交互作用。模拟了反映不同关联强度的四种情形。我们使用真实数据对该方法进行说明。
该方法成功识别出了除关联最弱情形之外所有情形中的真实交互作用。不过,也发现了一些虚假的交互作用。该方法还能够识别真实数据集中的交互作用。
我们得出结论,提升回归树可用于揭示流行病学研究中的复杂交互作用。