Kundu Madan Gopal, Harezlak Jaroslaw
abbvie, North Chicago, IL 60064.
Indiana University School of Public Health, Bloomington, IN 47405.
Biostat Epidemiol. 2019;3(1):1-22. doi: 10.1080/24709360.2018.1557797. Epub 2018 Dec 31.
Longitudinal changes in a population of interest are often heterogeneous and may be influenced by a combination of baseline factors. In such cases, traditional linear mixed effects models (Laird and Ware, 1982) assuming common parametric form for the mean structure may not be applicable. We show that the regression tree methodology for longitudinal data can identify and characterize longitudinally homogeneous subgroups. Most of the currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework (Hothorn, Hornik and Zeileis, 2006) that overcomes these limitations utilizing a two-step approach. The LongCART algorithm first selects the partitioning variable via a and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type-I error controlled and thus it guards against variable selection bias, over-fitting and spurious splitting. We have obtained the asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm were evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in levels among HIV-positive patients.
目标人群中的纵向变化通常是异质性的,并且可能受到多种基线因素的综合影响。在这种情况下,假设均值结构具有共同参数形式的传统线性混合效应模型(Laird和Ware,1982)可能不适用。我们表明,用于纵向数据的回归树方法可以识别并刻画纵向同质的亚组。目前大多数可用的回归树构建方法要么局限于重复测量情形,要么将亚组间的异质性与个体间的随机变异性结合起来。我们提出了一种在条件推断框架(Hothorn、Hornik和Zeileis,2006)下的纵向分类与回归树(LongCART)算法,该算法利用两步法克服了这些局限性。LongCART算法首先通过 选择划分变量,然后为选定的划分变量找到最优分割点。因此,在每个节点处,进一步分割的决策是受I型错误控制的,从而避免了变量选择偏差、过度拟合和虚假分割。我们已经得到了所提出的不稳定性检验的渐近结果,并通过模拟研究考察了其有限样本行为。通过模拟研究对LongCART算法的比较性能进行了实证评估。最后,我们应用LongCART来研究HIV阳性患者中 水平的纵向变化。