Kelly Rachel S, McGeachie Michael J, Lee-Sarwar Kathleen A, Kachroo Priyadarshini, Chu Su H, Virkud Yamini V, Huang Mengna, Litonjua Augusto A, Weiss Scott T, Lasky-Su Jessica
Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
Harvard Medical School, Boston, MA 02115, USA.
Metabolites. 2018 Oct 23;8(4):68. doi: 10.3390/metabo8040068.
To explore novel methods for the analysis of metabolomics data, we compared the ability of Partial Least Squares Discriminant Analysis (PLS-DA) and Bayesian networks (BN) to build predictive plasma metabolite models of age three asthma status in 411 three year olds ( = 59 cases and 352 controls) from the Vitamin D Antenatal Asthma Reduction Trial (VDAART) study. The standard PLS-DA approach had impressive accuracy for the prediction of age three asthma with an Area Under the Curve Convex Hull (AUCCH) of 81%. However, a permutation test indicated the possibility of overfitting. In contrast, a predictive Bayesian network including 42 metabolites had a significantly higher AUCCH of 92.1% ( for difference < 0.001), with no evidence that this accuracy was due to overfitting. Both models provided biologically informative insights into asthma; in particular, a role for dysregulated arginine metabolism and several exogenous metabolites that deserve further investigation as potential causative agents. As the BN model outperformed the PLS-DA model in both accuracy and decreased risk of overfitting, it may therefore represent a viable alternative to typical analytical approaches for the investigation of metabolomics data.
为探索代谢组学数据分析的新方法,我们比较了偏最小二乘判别分析(PLS-DA)和贝叶斯网络(BN)构建411名三岁儿童(59例病例和352例对照)三岁时哮喘状态的预测性血浆代谢物模型的能力,这些儿童来自维生素D产前哮喘减少试验(VDAART)研究。标准的PLS-DA方法在预测三岁时哮喘方面具有令人印象深刻的准确性,曲线下凸包面积(AUCCH)为81%。然而,排列检验表明存在过拟合的可能性。相比之下,一个包含42种代谢物的预测性贝叶斯网络的AUCCH显著更高,为92.1%(差异p<0.001),且没有证据表明这种准确性是由于过拟合。两种模型都为哮喘提供了具有生物学意义的见解;特别是,精氨酸代谢失调以及几种外源性代谢物的作用值得进一步研究,作为潜在的致病因素。由于BN模型在准确性和降低过拟合风险方面均优于PLS-DA模型,因此它可能是代谢组学数据研究中典型分析方法的一个可行替代方案。