Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH.
Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH.
Comput Biol Chem. 2021 Feb;90:107425. doi: 10.1016/j.compbiolchem.2020.107425. Epub 2020 Dec 8.
Birth weight is a key consequence of environmental exposures and metabolic alterations and can influence lifelong health. While a number of methods have been used to examine associations of trace element (including essential nutrients and toxic metals) concentrations or metabolite concentrations with a health outcome, birth weight, studies evaluating how the coexistence of these factors impacts birth weight are extremely limited. Here, we present a novel algorithm NETwork Clusters (NET-C), to improve the prediction of outcome by considering the interactions of features in the network and then apply this method to predict birth weight by jointly modelling trace element and cord blood metabolite data. Specifically, by using trace element and/or metabolite subnetworks as groups, we apply group lasso to estimate birth weight. We conducted statistical simulation studies to examine how both sample size and correlations between grouped features and the outcome affect prediction performance. We showed that in terms of prediction error, our proposed method outperformed other methods such as (a) group lasso with groups defined by hierarchical clustering, (b) random forest regression and (c) neural networks. We applied our method to data ascertained as part of the New Hampshire Birth Cohort Study on trace elements, metabolites and birth outcomes, adjusting for other covariates such as maternal body mass index (BMI) and enrollment age. Our proposed method can be applied to a variety of similarly structured high-dimensional datasets to predict health outcomes.
出生体重是环境暴露和代谢改变的关键后果,会影响终生健康。虽然已经有许多方法用于研究微量元素(包括必需营养素和有毒金属)浓度或代谢物浓度与健康结果之间的关联,但评估这些因素共存如何影响出生体重的研究极其有限。在这里,我们提出了一种新的算法 NETwork Clusters(NET-C),通过考虑网络中特征的相互作用来提高对结果的预测能力,然后应用这种方法通过联合建模微量元素和脐带血代谢物数据来预测出生体重。具体来说,我们使用微量元素和/或代谢物子网络作为分组,应用组套索来估计出生体重。我们进行了统计模拟研究,以检查分组特征与结果之间的样本量和相关性如何影响预测性能。我们表明,就预测误差而言,我们提出的方法优于其他方法,例如(a)通过层次聚类定义分组的组套索,(b)随机森林回归和(c)神经网络。我们将我们的方法应用于新罕布什尔州出生队列研究中微量元素、代谢物和出生结果的数据,同时调整了其他协变量,如母体体重指数(BMI)和入组年龄。我们提出的方法可以应用于各种具有类似结构的高维数据集,以预测健康结果。