Department of Nutritional Sciences, Faculty of Medicine, University of Toronto, Toronto, Canada.
Translational Medicine Program, Hospital for Sick Children, Toronto, Canada.
BMC Med Res Methodol. 2023 Oct 13;23(1):232. doi: 10.1186/s12874-023-02045-w.
Growth studies rely on longitudinal measurements, typically represented as trajectories. However, anthropometry is prone to errors that can generate outliers. While various methods are available for detecting outlier measurements, a gold standard has yet to be identified, and there is no established method for outlying trajectories. Thus, outlier types and their effects on growth pattern detection still need to be investigated. This work aimed to assess the performance of six methods at detecting different types of outliers, propose two novel methods for outlier trajectory detection and evaluate how outliers affect growth pattern detection.
We included 393 healthy infants from The Applied Research Group for Kids (TARGet Kids!) cohort and 1651 children with severe malnutrition from the co-trimoxazole prophylaxis clinical trial. We injected outliers of three types and six intensities and applied four outlier detection methods for measurements (model-based and World Health Organization cut-offs-based) and two for trajectories. We also assessed growth pattern detection before and after outlier injection using time series clustering and latent class mixed models. Error type, intensity, and population affected method performance.
Model-based outlier detection methods performed best for measurements with precision between 5.72-99.89%, especially for low and moderate error intensities. The clustering-based outlier trajectory method had high precision of 14.93-99.12%. Combining methods improved the detection rate to 21.82% in outlier measurements. Finally, when comparing growth groups with and without outliers, the outliers were shown to alter group membership by 57.9 -79.04%.
World Health Organization cut-off-based techniques were shown to perform well in few very particular cases (extreme errors of high intensity), while model-based techniques performed well, especially for moderate errors of low intensity. Clustering-based outlier trajectory detection performed exceptionally well across all types and intensities of errors, indicating a potential strategic change in how outliers in growth data are viewed. Finally, the importance of detecting outliers was shown, given its impact on children growth studies, as demonstrated by comparing results of growth group detection.
生长研究依赖于纵向测量,通常表现为轨迹。然而,人体测量学容易产生误差,从而产生异常值。虽然有多种方法可用于检测异常测量值,但尚未确定金标准,也没有用于异常轨迹的既定方法。因此,异常值的类型及其对生长模式检测的影响仍需研究。本研究旨在评估六种方法检测不同类型异常值的性能,提出两种新的异常轨迹检测方法,并评估异常值如何影响生长模式检测。
我们纳入了来自 TARGet Kids!队列的 393 名健康婴儿和来自复方磺胺甲噁唑预防临床试验的 1651 名严重营养不良儿童。我们注射了三种类型和六种强度的异常值,并应用了四种测量异常值的检测方法(基于模型和世界卫生组织截断值的方法)和两种轨迹异常值的检测方法。我们还在注射异常值前后使用时间序列聚类和潜在类别混合模型评估生长模式检测。错误类型、强度和人群会影响方法的性能。
对于测量值,基于模型的异常值检测方法的性能最佳,精度为 5.72-99.89%,尤其是对于低强度和中强度误差。基于聚类的异常轨迹方法的精度为 14.93-99.12%。结合方法可将异常值的检测率提高到 21.82%。最后,在比较有无异常值的生长组时,异常值使组归属发生了 57.9-79.04%的改变。
基于世界卫生组织截断值的技术在少数非常特殊的情况下表现良好(高强度极端误差),而基于模型的技术表现良好,尤其是在低强度的中度误差情况下。基于聚类的异常轨迹检测在所有类型和强度的误差下表现出色,表明人们对生长数据中异常值的看法可能会发生战略变化。最后,通过比较生长组检测结果,显示了检测异常值的重要性,因为它会影响儿童生长研究。