Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
BMC Med Res Methodol. 2012 Jan 25;12:6. doi: 10.1186/1471-2288-12-6.
The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age.
We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data.
The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child.
Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.
基于最近一项德国出生队列研究,该研究涉及 2007 名儿童,使用标准参数方法构建个体儿童未来体重指数 (BMI) 值的预测区间 (PI) 存在问题,因为儿童时期的 BMI 分布通常因年龄而异而存在偏态。
我们通过加性分位数回归直接建模 PI 的边界来避免分布假设,通过提升进行估计。我们提出了条件覆盖率的概念来证明 PI 的准确性。由于在实际应用中很难评估条件覆盖率,因此我们在拟合儿童特异性和协变量特异性未来 BMI 值的 PI 之前,针对当前数据的未来 BMI 模式进行了模拟研究。
模拟研究的结果表明,通过分位数提升拟合的 PI 以预定的覆盖率覆盖未来观察值,并优于基准方法。对于未来 BMI 值的预测,分位数提升自动选择信息丰富的协变量,并适应 BMI 分布的特定年龄偏态。估计的 PI 的长度是儿童特异性的,并且随着儿童年龄的增加而增加,这是预期的。
分位数提升是一种很有前途的方法,可以在非参数方式下构建具有正确条件覆盖率的 PI。它特别适合依赖协变量的 BMI 模式预测,因为它提供了可解释的预测器结构、固有的变量选择属性,甚至可以考虑到纵向数据结构。