UCL Great Ormond Street Institute of Child Health, London, UK.
Stat Methods Med Res. 2021 Feb;30(2):488-507. doi: 10.1177/0962280220958438. Epub 2020 Oct 12.
Growth reference centile charts are widely used in child health to assess weight, height and other age-varying measurements. The centiles are easy to construct from reference data, using the LMS method or GAMLSS (Generalised Additive Models for Location Scale and Shape). However, there is as yet no clear guidance on how to design such studies, and in particular how many reference data to collect, and this has led to study sizes varying widely. The paper aims to provide a theoretical framework for optimally designing growth reference studies based on cross-sectional data. Centiles for weight, height, body mass index and head circumference, in 6878 boys aged 0-21 years from the Fourth Dutch Growth Study, were fitted using GAMLSS. The effect on precision of varying the sample size and the distribution of measurement ages (sample composition) was explored by fitting a series of GAMLSS models to simulated data. Sample composition was defined as uniform on the age scale, where was chosen to give constant precision across the age range. Precision was measured on the z-score scale, and was the same for all four measurements, with a standard error of 0.041 z-score units for the median and 0.066 for the 2nd and 98th centiles. Compared to a naïve calculation, the process of smoothing the centiles increased the notional sample size two- to threefold by 'borrowing strength'. The sample composition for estimating the median curve was optimal for =0.4, reflecting considerable over-sampling of infants compared to children. However, for the 2nd and 98th centiles, =0.75 was optimal, with less infant over-sampling. The conclusion is that both sample size and sample composition need to be optimised. The paper provides practical advice on design, and concludes that optimally designed studies need 7000-25,000 subjects per sex.
生长参考百分位数图表广泛应用于儿童健康评估体重、身高和其他随年龄变化的测量值。百分位数可以使用 LMS 方法或 GAMLSS(广义加性模型用于位置、比例和形状)轻松地从参考数据中构建。然而,目前尚无关于如何设计此类研究的明确指导,特别是如何收集多少参考数据,这导致研究规模差异很大。本文旨在提供一个基于横断面数据优化设计生长参考研究的理论框架。使用 GAMLSS 拟合了来自第四次荷兰生长研究的 6878 名 0-21 岁男孩的体重、身高、体重指数和头围百分位数。通过拟合一系列 GAMLSS 模型来模拟数据,探讨了样本量和测量年龄分布(样本组成)变化对精度的影响。样本组成定义为年龄尺度上的均匀分布,选择 以在整个年龄范围内保持恒定的精度。精度在 z 分数尺度上进行测量,对所有四个测量值都是相同的,中位数的标准误差为 0.041 z 分数单位,第 2 和第 98 百分位数的标准误差为 0.066。与简单的计算相比,通过“借用力量”平滑百分位数的过程将名义样本量增加了两到三倍。用于估计中位数曲线的样本组成对于 =0.4 是最优的,这反映了与儿童相比,对婴儿的过度采样相当大。然而,对于第 2 和第 98 百分位数,=0.75 是最优的,婴儿的采样量较少。结论是样本量和样本组成都需要优化。本文提供了关于设计的实用建议,并得出结论,最佳设计的研究每个性别需要 7000-25000 名受试者。