Suppr超能文献

通过共享字典学习进行微生物组数据整合。

Microbiome data integration via shared dictionary learning.

作者信息

Yuan Bo, Wang Shulei

机构信息

Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA.

出版信息

Nat Commun. 2025 Sep 1;16(1):8147. doi: 10.1038/s41467-025-63425-y.

Abstract

Data integration is a powerful tool for facilitating a comprehensive and generalizable understanding of microbial communities and their association with outcomes of interest. However, integrating data sets from different studies remains a challenging problem because of severe batch effects, unobserved confounding variables, and high heterogeneity across data sets. We propose a new data integration method called MetaDICT, which initially estimates the batch effects by weighting methods in causal inference literature and then refines the estimation via novel shared dictionary learning. Compared with existing methods, MetaDICT can better avoid the overcorrection of batch effects and preserve biological variation when there exist unobserved confounding variables, data sets are highly heterogeneous across studies, or the batch is completely confounded with some covariates. Furthermore, MetaDICT can generate comparable embedding at both taxa and sample levels that can be used to unravel the hidden structure of the integrated data and improve the integrative analysis. Applications to synthetic and real microbiome data sets demonstrate the robustness and effectiveness of MetaDICT in integrative analysis. Using MetaDICT, we characterize microbial interaction, identify generalizable microbial signatures, and enhance the accuracy of outcome prediction in two real integrative studies, including an integrative analysis of colorectal cancer metagenomics studies and a meta-analysis of immunotherapy microbiome studies.

摘要

数据整合是一种强大的工具,有助于全面且可推广地理解微生物群落及其与感兴趣的结果之间的关联。然而,由于严重的批次效应、未观察到的混杂变量以及数据集之间的高度异质性,整合来自不同研究的数据集仍然是一个具有挑战性的问题。我们提出了一种名为MetaDICT的新数据整合方法,该方法首先通过因果推断文献中的加权方法估计批次效应,然后通过新颖的共享字典学习来完善估计。与现有方法相比,当存在未观察到的混杂变量、数据集在不同研究中高度异质或批次与某些协变量完全混淆时,MetaDICT可以更好地避免批次效应的过度校正并保留生物变异。此外,MetaDICT可以在分类群和样本水平上生成可比的嵌入,可用于揭示整合数据的隐藏结构并改进整合分析。在合成和真实微生物组数据集上的应用证明了MetaDICT在整合分析中的稳健性和有效性。使用MetaDICT,我们在两项实际整合研究中表征了微生物相互作用、识别了可推广的微生物特征并提高了结果预测的准确性,其中包括对结直肠癌宏基因组学研究的整合分析和免疫治疗微生物组研究的荟萃分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验