使用机器学习在微生物组数据中具有可重复性的生物标志物发现方法。

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning.

机构信息

Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, University of Utrecht, Utrecht, The Netherlands.

Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.

出版信息

BMC Bioinformatics. 2024 Jan 15;25(1):26. doi: 10.1186/s12859-024-05639-3.

DOI:10.1186/s12859-024-05639-3

PMID:38225565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10789030/

Abstract

BACKGROUND

In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research.

RESULTS

Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods.

CONCLUSIONS

We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.

摘要

背景

近年来，人类微生物组研究受到越来越多的关注，因为该领域被认为是临床应用的潜在来源。随着组学技术和人工智能的进步，使用机器学习工具在人类微生物组中发现潜在生物标志物的研究取得了积极的成果。尽管结果很有前景，但在这些研究中仍然存在一些问题，例如样本数量少的数据集、不一致的结果、缺乏统一的处理和方法以及其他额外的因素导致生物医学研究的可重复性差。在这项工作中，我们提出了一种方法，该方法结合了 16s rRNA 序列处理的 DADA2 管道和多个数据集的递归集成特征选择（REFS），以提高可重复性并在生物医学研究中获得稳健可靠的结果。

结果

进行了三个实验，分析了炎症性肠病（IBD）、自闭症谱系障碍（ASD）和 2 型糖尿病（T2D）患者/病例的微生物组数据。在每个实验中，我们在一个数据集中找到了一个生物标志物特征，并将其应用于另外两个数据集中进行进一步验证。所提出方法的有效性与其他特征选择方法（例如基于 F 分数的 K-Best 和随机选择作为基线）进行了比较。曲线下面积（AUC）被用作诊断准确性的度量标准，并用作比较所提出方法与其他特征选择方法的结果的指标。此外，我们使用马修斯相关系数（MCC）作为度量标准来评估该方法的性能以及与其他特征选择方法的比较。

结论

我们开发了一种用于 16s rRNA 微生物组序列分析的可重复生物标志物发现的方法，解决了与数据维度、不一致结果和跨独立数据集验证相关的问题。三个实验的结果，跨越 9 个不同的数据集，表明所提出的方法与其他特征选择方法相比，达到了更高的准确性。该方法是提高可重复性、提供稳健可靠结果的一种初步尝试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3b7/10789030/b1024e74c250/12859_2024_5639_Fig1_HTML.jpg

相似文献

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning.使用机器学习在微生物组数据中具有可重复性的生物标志物发现方法。

BMC Bioinformatics. 2024 Jan 15;25(1):26. doi: 10.1186/s12859-024-05639-3.

A robust microbiome signature for autism spectrum disorder across different studies using machine learning.使用机器学习为自闭症谱系障碍建立稳健的微生物组特征：来自不同研究的证据。

Sci Rep. 2024 Jan 8;14(1):814. doi: 10.1038/s41598-023-50601-7.

Robust prediction of colorectal cancer via gut microbiome 16S rRNA sequencing data.通过肠道微生物组 16S rRNA 测序数据进行稳健的结直肠癌预测。

J Med Microbiol. 2024 Oct;73(10). doi: 10.1099/jmm.0.001903.

Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods.基于不同特征选择方法筛选出的人类肠道微生物组炎症性肠病生物标志物。

PeerJ. 2022 Apr 25;10:e13205. doi: 10.7717/peerj.13205. eCollection 2022.

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems.基于机器学习的微生物组分类问题的有效应用框架。

mBio. 2020 Jun 9;11(3):e00434-20. doi: 10.1128/mBio.00434-20.

Stable feature selection based on the ensemble L -norm support vector machine for biomarker discovery.基于集成L -范数支持向量机的稳定特征选择用于生物标志物发现。

BMC Genomics. 2016 Dec 22;17(Suppl 13):1026. doi: 10.1186/s12864-016-3320-z.

Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease.基于机器学习的特征选择搜索稳定的微生物生物标志物：在炎症性肠病中的应用。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad083. Epub 2023 Oct 26.

A Machine Learning-Based Analytic Pipeline Applied to Clinical and Serum IgG Immunoproteome Data To Predict Chlamydia trachomatis Genital Tract Ascension and Incident Infection in Women.基于机器学习的分析流程应用于临床和血清 IgG 免疫蛋白组学数据，以预测沙眼衣原体生殖道上行和女性新发感染。

Microbiol Spectr. 2023 Aug 17;11(4):e0468922. doi: 10.1128/spectrum.04689-22. Epub 2023 Jun 15.

A Machine Learning Approach Reveals a Microbiota Signature for Infection with Mycobacterium avium subsp. in Cattle.机器学习方法揭示了牛分枝杆菌亚种感染的微生物组特征。

Microbiol Spectr. 2023 Feb 14;11(1):e0313422. doi: 10.1128/spectrum.03134-22. Epub 2023 Jan 19.

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。

Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.

引用本文的文献

Intestinal Microbiota and Fecal Transplantation in Patients with Inflammatory Bowel Disease and : An Updated Literature Review.炎症性肠病患者的肠道微生物群与粪便移植：文献综述更新

J Clin Med. 2025 Jul 25;14(15):5260. doi: 10.3390/jcm14155260.

Contributions of Artificial Intelligence to Analysis of Gut Microbiota in Autism Spectrum Disorder: A Systematic Review.人工智能对自闭症谱系障碍肠道微生物群分析的贡献：一项系统综述。

Children (Basel). 2024 Jul 31;11(8):931. doi: 10.3390/children11080931.

A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.基于机器学习应用的微生物组数据综合概述：分类、可及性及未来方向。

Front Microbiol. 2024 Feb 13;15:1343572. doi: 10.3389/fmicb.2024.1343572. eCollection 2024.

本文引用的文献

Nasal Bacteriomes of Patients with Asthma and Allergic Rhinitis Show Unique Composition, Structure, Function and Interactions.哮喘和过敏性鼻炎患者的鼻腔细菌群落显示出独特的组成、结构、功能及相互作用。

Microorganisms. 2023 Mar 7;11(3):683. doi: 10.3390/microorganisms11030683.

Classifying asthma control using salivary and fecal bacterial microbiome in children with moderate-to-severe asthma.利用唾液和粪便细菌微生物群对中重度哮喘儿童的哮喘控制情况进行分类。

Pediatr Allergy Immunol. 2023 Feb;34(2):e13919. doi: 10.1111/pai.13919.

The influence of machine learning technologies in gut microbiome research and cancer studies - A review.机器学习技术在肠道微生物组研究和癌症研究中的影响——综述

Life Sci. 2022 Dec 15;311(Pt A):121118. doi: 10.1016/j.lfs.2022.121118. Epub 2022 Oct 28.

Unique Pakistani gut microbiota highlights population-specific microbiota signatures of type 2 diabetes mellitus.独特的巴基斯坦肠道微生物群突出了 2 型糖尿病的特定人群的微生物群特征。

Gut Microbes. 2022 Jan-Dec;14(1):2142009. doi: 10.1080/19490976.2022.2142009.

Gut Microbiome in Colorectal Cancer: Clinical Diagnosis and Treatment.结直肠癌的肠道微生物组：临床诊断与治疗。

Genomics Proteomics Bioinformatics. 2023 Feb;21(1):84-96. doi: 10.1016/j.gpb.2022.07.002. Epub 2022 Jul 30.

Predicting cancer immunotherapy response from gut microbiomes using machine learning models.利用机器学习模型从肠道微生物组预测癌症免疫疗法反应。

Oncotarget. 2022 Jul 19;13:876-889. doi: 10.18632/oncotarget.28252. eCollection 2022.

Microbiome Analysis via OTU and ASV-Based Pipelines-A Comparative Interpretation of Ecological Data in WWTP Systems.基于OTU和ASV方法的污水处理厂系统微生物组分析——生态数据的比较解读

Bioengineering (Basel). 2022 Mar 29;9(4):146. doi: 10.3390/bioengineering9040146.

Application of machine learning tools: Potential and useful approach for the prediction of type 2 diabetes mellitus based on the gut microbiome profile.机器学习工具的应用：基于肠道微生物群谱预测2型糖尿病的潜在且有用的方法。

Exp Ther Med. 2022 Apr;23(4):305. doi: 10.3892/etm.2022.11234. Epub 2022 Feb 23.

Gut microbiome alteration as a diagnostic tool and associated with inflammatory response marker in primary liver cancer.肠道微生物组改变作为原发性肝癌的诊断工具，并与炎症反应标志物相关。

Hepatol Int. 2022 Feb;16(1):99-111. doi: 10.1007/s12072-021-10279-3. Epub 2022 Jan 22.

Characteristics of Fecal Microbiota and Machine Learning Strategy for Fecal Invasive Biomarkers in Pediatric Inflammatory Bowel Disease.粪便微生物群特征与基于机器学习的儿童炎症性肠病粪便侵袭性生物标志物策略

Front Cell Infect Microbiol. 2021 Dec 7;11:711884. doi: 10.3389/fcimb.2021.711884. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用机器学习在微生物组数据中具有可重复性的生物标志物发现方法。

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献