通过共享字典学习进行微生物组数据整合。

Microbiome data integration via shared dictionary learning.

作者信息

Yuan Bo, Wang Shulei

机构信息

Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA.

出版信息

Nat Commun. 2025 Sep 1;16(1):8147. doi: 10.1038/s41467-025-63425-y.

DOI:10.1038/s41467-025-63425-y

PMID:40890119

Abstract

Data integration is a powerful tool for facilitating a comprehensive and generalizable understanding of microbial communities and their association with outcomes of interest. However, integrating data sets from different studies remains a challenging problem because of severe batch effects, unobserved confounding variables, and high heterogeneity across data sets. We propose a new data integration method called MetaDICT, which initially estimates the batch effects by weighting methods in causal inference literature and then refines the estimation via novel shared dictionary learning. Compared with existing methods, MetaDICT can better avoid the overcorrection of batch effects and preserve biological variation when there exist unobserved confounding variables, data sets are highly heterogeneous across studies, or the batch is completely confounded with some covariates. Furthermore, MetaDICT can generate comparable embedding at both taxa and sample levels that can be used to unravel the hidden structure of the integrated data and improve the integrative analysis. Applications to synthetic and real microbiome data sets demonstrate the robustness and effectiveness of MetaDICT in integrative analysis. Using MetaDICT, we characterize microbial interaction, identify generalizable microbial signatures, and enhance the accuracy of outcome prediction in two real integrative studies, including an integrative analysis of colorectal cancer metagenomics studies and a meta-analysis of immunotherapy microbiome studies.

摘要

数据整合是一种强大的工具，有助于全面且可推广地理解微生物群落及其与感兴趣的结果之间的关联。然而，由于严重的批次效应、未观察到的混杂变量以及数据集之间的高度异质性，整合来自不同研究的数据集仍然是一个具有挑战性的问题。我们提出了一种名为MetaDICT的新数据整合方法，该方法首先通过因果推断文献中的加权方法估计批次效应，然后通过新颖的共享字典学习来完善估计。与现有方法相比，当存在未观察到的混杂变量、数据集在不同研究中高度异质或批次与某些协变量完全混淆时，MetaDICT可以更好地避免批次效应的过度校正并保留生物变异。此外，MetaDICT可以在分类群和样本水平上生成可比的嵌入，可用于揭示整合数据的隐藏结构并改进整合分析。在合成和真实微生物组数据集上的应用证明了MetaDICT在整合分析中的稳健性和有效性。使用MetaDICT，我们在两项实际整合研究中表征了微生物相互作用、识别了可推广的微生物特征并提高了结果预测的准确性，其中包括对结直肠癌宏基因组学研究的整合分析和免疫治疗微生物组研究的荟萃分析。

相似文献

Microbiome data integration via shared dictionary learning.通过共享字典学习进行微生物组数据整合。

Nat Commun. 2025 Sep 1;16(1):8147. doi: 10.1038/s41467-025-63425-y.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂（GLP-1 RAs）减肥效果的网状Meta分析的数量、质量及结果：一项范围综述

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.与随机试验中评估的医疗保健结果相比，观察性研究设计评估的医疗保健结果。

Cochrane Database Syst Rev. 2014 Apr 29;2014(4):MR000034. doi: 10.1002/14651858.MR000034.pub2.

Exploring fecal microbiota signatures associated with immune response and antibiotic impact in NSCLC: insights from metagenomic and machine learning approaches.探索与非小细胞肺癌免疫反应及抗生素影响相关的粪便微生物群特征：宏基因组学和机器学习方法的见解

Front Cell Infect Microbiol. 2025 Jul 28;15:1591076. doi: 10.3389/fcimb.2025.1591076. eCollection 2025.

本文引用的文献

Processing-bias correction with DEBIAS-M improves cross-study generalization of microbiome-based prediction models.使用DEBIAS-M进行处理偏差校正可提高基于微生物组的预测模型的跨研究泛化能力。

Nat Microbiol. 2025 Apr;10(4):897-911. doi: 10.1038/s41564-025-01954-4. Epub 2025 Mar 27.

: one species with multiple potential implications in cancer research.一种在癌症研究中有多种潜在影响的物种。

Gut. 2025 Jun 6;74(7):1038-1039. doi: 10.1136/gutjnl-2024-334338.

Faecalibacteriumprausnitzii Is Associated With Clinical Response to Immune Checkpoint Inhibitors in Patients With Advanced Gastric Adenocarcinoma: Results of Microbiota Analysis of PRODIGE 59-FFCD 1707-DURIGAST Trial.普拉梭菌与晚期胃腺癌患者对免疫检查点抑制剂的临床反应相关：PRODIGE 59-FFCD 1707-DURIGAST试验的微生物群分析结果

Gastroenterology. 2025 Mar;168(3):601-603.e2. doi: 10.1053/j.gastro.2024.10.020. Epub 2024 Oct 23.

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche.一种独特的具核梭杆菌（Fusobacterium nucleatum）分支在结直肠癌生态位中占据主导地位。

Nature. 2024 Apr;628(8007):424-432. doi: 10.1038/s41586-024-07182-w. Epub 2024 Mar 20.

Batch-effect correction with sample remeasurement in highly confounded case-control studies.在高度混杂的病例对照研究中，通过样本复测进行批次效应校正。

Nat Comput Sci. 2023 Aug;3(8):709-719. doi: 10.1038/s43588-023-00500-8. Epub 2023 Aug 23.

RSim: A reference-based normalization method via rank similarity.RSim：一种基于秩相似性的参考归一化方法。

PLoS Comput Biol. 2023 Sep 1;19(9):e1011447. doi: 10.1371/journal.pcbi.1011447. eCollection 2023 Sep.

bacteremia in a patient with acute myeloid leukemia and stomatitis: An emerging pathogen.一名急性髓系白血病合并口腔炎患者的菌血症：一种新出现的病原体。

IDCases. 2023 Jul 3;33:e01837. doi: 10.1016/j.idcr.2023.e01837. eCollection 2023.

Faecalibacterium prausnitzii Abrogates Intestinal Toxicity and Promotes Tumor Immunity to Increase the Efficacy of Dual CTLA4 and PD-1 Checkpoint Blockade.普拉梭菌可消除肠道毒性并促进肿瘤免疫，以增强CTLA4和PD-1双重检查点阻断的疗效。

Cancer Res. 2023 Nov 15;83(22):3710-3725. doi: 10.1158/0008-5472.CAN-23-0605.

Nitrate and a nitrate-reducing Rothia aeria strain as potential prebiotic or synbiotic treatments for periodontitis.硝酸盐和一种能够还原硝酸盐的罗特西亚属空气细菌菌株，作为牙周炎的潜在益生元或共生元治疗方法。

NPJ Biofilms Microbiomes. 2023 Jun 17;9(1):40. doi: 10.1038/s41522-023-00406-3.

Latent Dirichlet Allocation modeling of environmental microbiomes.环境微生物组的潜在狄利克雷分配建模。

PLoS Comput Biol. 2023 Jun 8;19(6):e1011075. doi: 10.1371/journal.pcbi.1011075. eCollection 2023 Jun.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过共享字典学习进行微生物组数据整合。

Microbiome data integration via shared dictionary learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献