Lu Miao, Zhou Jianhui, Naylor Caitlin, Kirkpatrick Beth D, Haque Rashidul, Petri William A, Ma Jennie Z
Department of Statistics, University of Virginia, Charlottesville, USA.
Division of Infectious Diseases, School of Medicine, University of Virginia, Charlottesville, USA.
Biomark Res. 2017 Mar 9;5:9. doi: 10.1186/s40364-017-0089-4. eCollection 2017.
Environmental Enteropathy (EE) is a subclinical condition caused by constant fecal-oral contamination and resulting in blunting of intestinal villi and intestinal inflammation. Of primary interest in the clinical research is to evaluate the association between non-invasive EE biomarkers and malnutrition in a cohort of Bangladeshi children. The challenges are that the number of biomarkers/covariates is relatively large, and some of them are highly correlated.
Many variable selection methods are available in the literature, but which are most appropriate for EE biomarker selection remains unclear. In this study, different variable selection approaches were applied and the performance of these methods was assessed numerically through simulation studies, assuming the correlations among covariates were similar to those in the Bangladesh cohort. The suggested methods from simulations were applied to the Bangladesh cohort to select the most relevant biomarkers for the growth response, and bootstrapping methods were used to evaluate the consistency of selection results.
Through simulation studies, SCAD (Smoothly Clipped Absolute Deviation), Adaptive LASSO (Least Absolute Shrinkage and Selection Operator) and MCP (Minimax Concave Penalty) are the suggested variable selection methods, compared to traditional stepwise regression method. In the Bangladesh data, predictors such as mother weight, height-for-age z-score (HAZ) at week 18, and inflammation markers (Myeloperoxidase (MPO) at week 12 and soluable CD14 at week 18) are informative biomarkers associated with children's growth.
Penalized linear regression methods are plausible alternatives to traditional variable selection methods, and the suggested methods are applicable to other biomedical studies. The selected early-stage biomarkers offer a potential explanation for the burden of malnutrition problems in low-income countries, allow early identification of infants at risk, and suggest pathways for intervention.
This study was retrospectively registered with ClinicalTrials.gov, number NCT01375647, on June 3, 2011.
环境肠病(EE)是一种由持续粪口污染引起的亚临床病症,会导致肠绒毛变钝和肠道炎症。临床研究的主要关注点是评估一组孟加拉国儿童中无创性EE生物标志物与营养不良之间的关联。面临的挑战是生物标志物/协变量的数量相对较多,且其中一些高度相关。
文献中有许多变量选择方法,但哪种方法最适合EE生物标志物选择尚不清楚。在本研究中,应用了不同的变量选择方法,并通过模拟研究对这些方法的性能进行了数值评估,假设协变量之间的相关性与孟加拉国队列中的相似。将模拟中建议的方法应用于孟加拉国队列,以选择与生长反应最相关的生物标志物,并使用自助法评估选择结果的一致性。
通过模拟研究,与传统逐步回归方法相比,平滑截断绝对偏差(SCAD)、自适应最小绝对收缩和选择算子(Adaptive LASSO)以及最小最大凹惩罚(MCP)是建议的变量选择方法。在孟加拉国的数据中,诸如母亲体重、第18周的年龄别身高Z评分(HAZ)以及炎症标志物(第12周的髓过氧化物酶(MPO)和第18周的可溶性CD14)等预测因子是与儿童生长相关的信息性生物标志物。
惩罚线性回归方法是传统变量选择方法的合理替代方法,且建议的方法适用于其他生物医学研究。所选的早期生物标志物为低收入国家营养不良问题的负担提供了潜在解释,有助于早期识别有风险的婴儿,并为干预提供途径。
本研究于2011年6月3日在ClinicalTrials.gov进行回顾性注册,编号为NCT01375647。