Department of Statistics, Iowa State University, 2438 Osborn Dr, Ames, IA, 50011, USA.
Department of Veterinary Diagnostic and Production Animal Medicine, Iowa State University, 2203 Lloyd Veterinary Medical Center, Ames, IA, 50011, USA.
BMC Bioinformatics. 2021 Jul 6;22(1):362. doi: 10.1186/s12859-021-04232-2.
Microbiome studies have uncovered associations between microbes and human, animal, and plant health outcomes. This has led to an interest in developing microbial interventions for treatment of disease and optimization of crop yields which requires identification of microbiome features that impact the outcome in the population of interest. That task is challenging because of the high dimensionality of microbiome data and the confounding that results from the complex and dynamic interactions among host, environment, and microbiome. In the presence of such confounding, variable selection and estimation procedures may have unsatisfactory performance in identifying microbial features with an effect on the outcome.
In this manuscript, we aim to estimate population-level effects of individual microbiome features while controlling for confounding by a categorical variable. Due to the high dimensionality and confounding-induced correlation between features, we propose feature screening, selection, and estimation conditional on each stratum of the confounder followed by a standardization approach to estimation of population-level effects of individual features. Comprehensive simulation studies demonstrate the advantages of our approach in recovering relevant features. Utilizing a potential-outcomes framework, we outline assumptions required to ascribe causal, rather than associational, interpretations to the identified microbiome effects. We conducted an agricultural study of the rhizosphere microbiome of sorghum in which nitrogen fertilizer application is a confounding variable. In this study, the proposed approach identified microbial taxa that are consistent with biological understanding of potential plant-microbe interactions.
Standardization enables more accurate identification of individual microbiome features with an effect on the outcome of interest compared to other variable selection and estimation procedures when there is confounding by a categorical variable.
微生物组研究揭示了微生物与人类、动物和植物健康结果之间的关联。这导致了人们对开发微生物干预措施的兴趣,以治疗疾病和优化作物产量,这需要确定影响目标人群中结果的微生物组特征。由于微生物组数据的高维度和宿主、环境和微生物组之间复杂和动态相互作用所导致的混杂,这项任务具有挑战性。在存在这种混杂的情况下,变量选择和估计程序在识别对结果有影响的微生物特征方面可能表现不佳。
在本文中,我们旨在估计个体微生物组特征对人群水平的影响,同时通过分类变量控制混杂。由于特征之间的高维度和混杂引起的相关性,我们提出了基于混杂每个层次的特征筛选、选择和估计,然后采用标准化方法估计个体特征对人群水平的影响。综合模拟研究表明了我们的方法在恢复相关特征方面的优势。利用潜在结果框架,我们概述了将识别出的微生物组效应归因于因果关系而不是关联关系所需的假设。我们进行了一项关于高粱根际微生物组的农业研究,其中氮肥的应用是一个混杂变量。在这项研究中,所提出的方法确定了与潜在的植物-微生物相互作用的生物学理解一致的微生物分类群。
当存在分类变量混杂时,与其他变量选择和估计程序相比,标准化可以更准确地识别对感兴趣的结果有影响的个体微生物组特征。