基于随机特征选择的半监督潜在狄利克雷分配在微生物组分析中的应用。

Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis.

机构信息

Department of Statistics, University of Connecticut, Storrs, CT, USA.

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.

出版信息

Sci Rep. 2024 Apr 17;14(1):8855. doi: 10.1038/s41598-024-59682-4.

DOI:10.1038/s41598-024-59682-4

PMID:38632488

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11024186/

Abstract

Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome). An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In this paper, we present a novel semi-supervised methodology known as Randomized Feature Selection based Latent Dirichlet Allocation (RFSLDA) to study the impact of the gut microbiome on a subject's health status. Since the data in our study consists of fuzzy health labels, which are self-reported, traditional supervised learning approaches may not be suitable. As a first step, based on the similarity between documents in text analysis and gut-microbiome data, we employ Latent Dirichlet Allocation (LDA), a topic modeling approach which uses microbiome counts as features to group subjects into relatively homogeneous clusters, without invoking any knowledge of observed health status (labels) of subjects. We then leverage information from the observed health status of subjects to associate these clusters with the most similar health status making it a semi-supervised approach. Finally, a feature selection technique is incorporated into the model to improve the overall classification performance. The proposed method provides a semi-supervised topic modelling approach that can help handle the high dimensionality of the microbiome data in association studies. Our experiments reveal that our semi-supervised classification algorithm is effective and efficient in terms of high classification accuracy compared to popular supervised learning approaches like SVM and multinomial logistic model. The RFSLDA framework is attractive because it (i) enhances clustering accuracy by identifying key bacteria types as indicators of health status, (ii) identifies key bacteria types within each group based on estimates of the proportion of bacteria types within the groups, and (iii) computes a measure of within-group similarity to identify highly similar subjects in terms of their health status.

摘要

健康和疾病从根本上受到微生物群落及其基因（微生物组）的影响。深入分析微生物组结构，能够根据个体的健康状况对其进行分类，这对于增强诊断和治疗策略，提高个体的整体健康水平至关重要。在本文中，我们提出了一种新颖的半监督方法，称为基于随机特征选择的潜在狄利克雷分配（RFSLDA），用于研究肠道微生物组对个体健康状况的影响。由于我们研究中的数据包含模糊的健康标签，这些标签是自我报告的，因此传统的监督学习方法可能并不适用。作为第一步，基于文本分析和肠道微生物组数据中文档之间的相似性，我们使用潜在狄利克雷分配（LDA），这是一种主题建模方法，它使用微生物组计数作为特征，将个体分组为相对同质的聚类，而无需调用个体观察到的健康状况（标签）的任何知识。然后，我们利用个体观察到的健康状况的信息将这些聚类与最相似的健康状况相关联，从而使该方法成为半监督方法。最后，将特征选择技术纳入模型中，以提高整体分类性能。所提出的方法提供了一种半监督主题建模方法，可以帮助处理关联研究中微生物组数据的高维度。我们的实验表明，与 SVM 和多项逻辑模型等流行的监督学习方法相比，我们的半监督分类算法在高分类准确性方面是有效和高效的。RFSLDA 框架很有吸引力，因为它 (i) 通过识别关键细菌类型作为健康状况的指标来提高聚类准确性，(ii) 根据组内细菌类型的估计值识别每个组内的关键细菌类型，以及 (iii) 计算组内相似性的度量标准，以识别健康状况高度相似的个体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bed9/11024186/85bc5298ddc6/41598_2024_59682_Fig1_HTML.jpg

相似文献

Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis.

Sci Rep. 2024 Apr 17;14(1):8855. doi: 10.1038/s41598-024-59682-4.

Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions.

PeerJ Comput Sci. 2021 Aug 11;7:e677. doi: 10.7717/peerj-cs.677. eCollection 2021.

Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods.

PeerJ. 2022 Apr 25;10:e13205. doi: 10.7717/peerj.13205. eCollection 2022.

Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation.

Microbiome. 2020 Jun 23;8(1):95. doi: 10.1186/s40168-020-00864-3.

Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors.

BMC Bioinformatics. 2022 Jun 8;23(1):223. doi: 10.1186/s12859-022-04764-1.

An integrated clustering and BERT framework for improved topic modeling.

Int J Inf Technol. 2023;15(4):2187-2195. doi: 10.1007/s41870-023-01268-w. Epub 2023 May 6.

Gene-based microbiome representation enhances host phenotype classification.

mSystems. 2023 Aug 31;8(4):e0053123. doi: 10.1128/msystems.00531-23. Epub 2023 Jul 5.

A new approach to describe the taxonomic structure of microbiome and its application to assess the relationship between microbial niches.

BMC Bioinformatics. 2024 Feb 5;25(1):58. doi: 10.1186/s12859-023-05575-8.

A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

mSphere. 2017 Dec 13;2(6). doi: 10.1128/mSphereDirect.00536-17. eCollection 2017 Nov-Dec.

Weakly Semi-supervised phenotyping using Electronic Health records.

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

本文引用的文献

Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad083. Epub 2023 Oct 26.

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation.

Biometrics. 2023 Sep;79(3):2321-2332. doi: 10.1111/biom.13772. Epub 2022 Oct 28.

BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets.

Methods Protoc. 2022 May 23;5(3):42. doi: 10.3390/mps5030042.

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.

Front Microbiol. 2021 Feb 19;12:634511. doi: 10.3389/fmicb.2021.634511. eCollection 2021.

A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies.

Front Genet. 2021 Jan 22;11:602594. doi: 10.3389/fgene.2020.602594. eCollection 2020.

A predictive index for health status using species-level gut microbiome profiling.

Nat Commun. 2020 Sep 15;11(1):4635. doi: 10.1038/s41467-020-18476-8.

Microbiome definition re-visited: old concepts and new challenges.

Microbiome. 2020 Jun 30;8(1):103. doi: 10.1186/s40168-020-00875-0.

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems.

mBio. 2020 Jun 9;11(3):e00434-20. doi: 10.1128/mBio.00434-20.

Longitudinal multi-omics of host-microbe dynamics in prediabetes.

Nature. 2019 May;569(7758):663-671. doi: 10.1038/s41586-019-1236-x. Epub 2019 May 29.

Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease.

Biomed Res Int. 2018 Nov 15;2018:3130607. doi: 10.1155/2018/3130607. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于随机特征选择的半监督潜在狄利克雷分配在微生物组分析中的应用。

Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献