Huck Institutes of the Life Sciences, The Pennsylvania State University, 512 Wartik, University Park, PA 16802, USA†Morris A. Aguilar was supported on the PSU/NIDDK funded Integrative Analysis of Metabolic Phenotypes (IAMP) Predoctoral Training Program (T32DK120509). Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA239256. This work was supported by the USDA National Institute of Food and Agriculture and Hatch Appropriations under Project #PEN04275 and Accession #1018544, Huck Institutes for the Life Sciences, Penn State Cancer Institute, and the Dr. Frances Keesler Graham Early Career Professorship.,
Pac Symp Biocomput. 2021;26:316-327.
Environmental exposure pathophysiology related to smoking can yield metabolic changes that are difficult to describe in a biologically informative fashion with manual proprietary software. Nuclear magnetic resonance (NMR) spectroscopy detects compounds found in biofluids yielding a metabolic snapshot. We applied our semi-automated NMR pipeline for a secondary analysis of a smoking study (MTBLS374 from the MetaboLights repository) (n = 112). This involved quality control (in the form of data preprocessing), automated metabolite quantification, and analysis. With our approach we putatively identified 79 metabolites that were previously unreported in the dataset. Quantified metabolites were used for metabolic pathway enrichment analysis that replicated 1 enriched pathway with the original study as well as 3 previously unreported pathways. Our pipeline generated a new random forest (RF) classifier between smoking classes that revealed several combinations of compounds. This study broadens our metabolomic understanding of smoking exposure by 1) notably increasing the number of quantified metabolites with our analytic pipeline, 2) suggesting smoking exposure may lead to heterogenous metabolic responses according to random forest modeling, and 3) modeling how newly quantified individual metabolites can determine smoking status. Our approach can be applied to other NMR studies to characterize environmental risk factors, allowing for the discovery of new biomarkers of disease and exposure status.
环境暴露与吸烟相关的病理生理学可导致代谢变化,这些变化很难用手动专用软件以生物信息的方式进行描述。核磁共振(NMR)光谱检测生物流体中存在的化合物,提供代谢快照。我们应用半自动 NMR 分析管道对一项吸烟研究(来自 MetaboLights 存储库的 MTBLS374)进行了二次分析(n = 112)。这涉及质量控制(以数据预处理的形式)、自动代谢物定量和分析。通过我们的方法,我们推测出在数据集之前未报告的 79 种代谢物。定量代谢物用于代谢途径富集分析,该分析复制了与原始研究以及 3 个以前未报告的途径相关的 1 个富集途径。我们的管道在吸烟类别之间生成了一个新的随机森林(RF)分类器,揭示了几种化合物组合。这项研究通过 1)显著增加了我们分析管道定量代谢物的数量,2)表明吸烟暴露可能导致根据随机森林模型的异质代谢反应,以及 3)模拟新定量的单个代谢物如何确定吸烟状态,从而拓宽了我们对吸烟暴露的代谢组学理解。我们的方法可以应用于其他 NMR 研究,以描述环境风险因素,从而发现疾病和暴露状态的新生物标志物。