Kalia Vrinda, Walker Douglas I, Krasnodemski Katherine M, Jones Dean P, Miller Gary W, Kioumourtzoglou Marianthi-Anna
Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032.
Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029.
Curr Opin Environ Sci Health. 2020 Jun;15:32-38. doi: 10.1016/j.coesh.2020.05.001. Epub 2020 May 19.
Understanding the effect of the environment on human health has benefited from progress made in measuring the exposome. High resolution mass spectrometry (HRMS) has made it possible to measure small molecules across a large dynamic range, allowing researchers to study the role of low abundance environmental toxicants in causing human disease. HRMS data have a high dimensional structure (number of predictors >> number of observations), generating information on the abundance of many chemical features (predictors) which may be highly correlated. Unsupervised dimension reduction techniques can allow dimensionality reduction of the various features into components that capture the essence of the variability in the exposome dataset. We illustrate and discuss the relevance of three different unsupervised dimension reduction techniques: principal component analysis, factor analysis, and non-negative matrix factorization. We focus on the utility of each method in understanding the relationship between the exposome and a disease outcome and describe their strengths and limitations. While the utility of these methods is context specific, it remains important to focus on the interpretability of results from each method.
在测量暴露组方面取得的进展有助于理解环境对人类健康的影响。高分辨率质谱(HRMS)使在大动态范围内测量小分子成为可能,让研究人员能够研究低丰度环境毒物在引发人类疾病中的作用。HRMS数据具有高维结构(预测变量数量 >> 观测值数量),会生成有关许多化学特征(预测变量)丰度的信息,而这些特征可能高度相关。无监督降维技术可以将各种特征的维度降低为能够捕捉暴露组数据集中变异性本质的成分。我们阐述并讨论三种不同无监督降维技术的相关性:主成分分析、因子分析和非负矩阵分解。我们重点关注每种方法在理解暴露组与疾病结局之间关系方面的效用,并描述它们的优势和局限性。虽然这些方法的效用因具体情况而异,但关注每种方法结果的可解释性仍然很重要。