Goldman Miriam, Zhao Chunyu, Pollard Katherine S
Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, California, United States of America.
Institute of Data Science & Biotechnology, Gladstone Institutes, San Francisco, California, United States of America.
PLoS Comput Biol. 2025 May 27;21(5):e1012277. doi: 10.1371/journal.pcbi.1012277. eCollection 2025 May.
Microbiome association studies typically link host disease or other traits to summary statistics measured in metagenomics data, such as diversity or taxonomic composition. But identifying disease-associated species based on their relative abundance does not provide insight into why these microbes act as disease markers, and it overlooks cases where disease risk is related to specific strains with unique biological functions. To bridge this knowledge gap, we developed microSLAM, a mixed-effects model and an R package that performs association tests that connect host traits to the presence/absence of genes within each microbiome species, while accounting for strain genetic relatedness across hosts. Traits can be quantitative or binary (such as case/control). MicroSLAM is fit in three steps for each species. The first step estimates population structure across hosts. Step two calculates the association between population structure and the trait, enabling detection of species for which a subset of related strains confer risk. To identify specific genes whose presence/absence across diverse strains is associated with the trait, step three models the trait as a function of gene occurrence plus random effects estimated from step two. Applying microSLAM to 710 gut metagenomes from inflammatory bowel disease (IBD) samples, we discovered 56 species whose population structure correlates with IBD, meaning that different lineages are found in cases versus controls. After controlling for population structure, 20 species had genes significantly associated with IBD. Twenty-one of these genes were more common in IBD patients, while 32 genes were enriched in healthy controls, including a seven-gene operon in Faecalibacterium prausnitzii that is involved in utilization of fructoselysine from the gut environment. The vast majority of species detected by microSLAM were not significantly associated with IBD using standard relative abundance tests. These findings highlight the importance of accounting for within-species genetic variation in microbiome studies.
微生物组关联研究通常将宿主疾病或其他特征与宏基因组数据中测量的汇总统计数据联系起来,例如多样性或分类组成。但是,基于相对丰度识别与疾病相关的物种并不能深入了解这些微生物为何充当疾病标志物,而且它忽略了疾病风险与具有独特生物学功能的特定菌株相关的情况。为了弥补这一知识差距,我们开发了microSLAM,这是一种混合效应模型和一个R包,它执行关联测试,将宿主特征与每个微生物组物种内基因的存在与否联系起来,同时考虑宿主间的菌株遗传相关性。特征可以是定量的或二元的(例如病例/对照)。对于每个物种,microSLAM分三步进行拟合。第一步估计宿主间的群体结构。第二步计算群体结构与特征之间的关联,从而能够检测出相关菌株的一个子集赋予风险的物种。为了识别不同菌株中基因的存在与否与特征相关的特定基因,第三步将特征建模为基因出现情况加上第二步估计的随机效应的函数。将microSLAM应用于来自炎症性肠病(IBD)样本的710个肠道宏基因组,我们发现了56个物种,其群体结构与IBD相关,这意味着在病例组和对照组中发现了不同的谱系。在控制了群体结构后,有20个物种的基因与IBD显著相关。其中21个基因在IBD患者中更常见,而32个基因在健康对照中富集,包括普拉梭菌中一个参与利用肠道环境中果糖赖氨酸的七基因操纵子。使用标准相对丰度测试,microSLAM检测到的绝大多数物种与IBD没有显著关联。这些发现凸显了在微生物组研究中考虑物种内遗传变异的重要性。