• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DCMD:基于距离的微生物组数据混合分布分类方法。

DCMD: Distance-based classification using mixture distributions on microbiome data.

机构信息

Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, CANADA.

Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, CANADA.

出版信息

PLoS Comput Biol. 2021 Mar 12;17(3):e1008799. doi: 10.1371/journal.pcbi.1008799. eCollection 2021 Mar.

DOI:10.1371/journal.pcbi.1008799
PMID:33711013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7990174/
Abstract

Current advances in next-generation sequencing techniques have allowed researchers to conduct comprehensive research on the microbiome and human diseases, with recent studies identifying associations between the human microbiome and health outcomes for a number of chronic conditions. However, microbiome data structure, characterized by sparsity and skewness, presents challenges to building effective classifiers. To address this, we present an innovative approach for distance-based classification using mixture distributions (DCMD). The method aims to improve classification performance using microbiome community data, where the predictors are composed of sparse and heterogeneous count data. This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data and representing each observation as a distribution, conditional on observed counts and the estimated mixture, which are then used as inputs for distance-based classification. The method is implemented into a k-means classification and k-nearest neighbours framework. We develop two distance metrics that produce optimal results. The performance of the model is assessed using simulated and human microbiome study data, with results compared against a number of existing machine learning and distance-based classification approaches. The proposed method is competitive when compared to the other machine learning approaches, and shows a clear improvement over commonly used distance-based classifiers, underscoring the importance of modelling sparsity for achieving optimal results. The range of applicability and robustness make the proposed method a viable alternative for classification using sparse microbiome count data. The source code is available at https://github.com/kshestop/DCMD for academic use.

摘要

目前,下一代测序技术的进展使得研究人员能够对微生物组和人类疾病进行全面研究,最近的研究确定了人类微生物组与许多慢性疾病的健康结果之间的关联。然而,微生物组数据的结构,具有稀疏性和偏态性,给建立有效的分类器带来了挑战。针对这一问题,我们提出了一种使用混合分布(DCMD)进行基于距离的分类的创新方法。该方法旨在利用微生物组群落数据来提高分类性能,其中预测因子由稀疏和异构的计数数据组成。该方法通过对样本数据进行混合分布估计,对稀疏计数中的固有不确定性进行建模,并将每个观测表示为在观测计数和估计混合的条件下的分布,然后将其用作基于距离的分类的输入。该方法被实现为 k-均值分类和 k-最近邻框架。我们开发了两种产生最佳结果的距离度量。使用模拟和人类微生物组研究数据评估模型的性能,并与许多现有的机器学习和基于距离的分类方法进行比较。与其他机器学习方法相比,该方法具有竞争力,并且与常用的基于距离的分类器相比,表现出明显的改进,这突显了对稀疏微生物组计数数据进行建模以获得最佳结果的重要性。该方法的适用范围和稳健性使其成为使用稀疏微生物组计数数据进行分类的可行选择。源代码可在 https://github.com/kshestop/DCMD 上获得,供学术使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/054e5a724988/pcbi.1008799.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/f2f39ba3a61f/pcbi.1008799.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/9382db68f272/pcbi.1008799.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/40672d9a8a46/pcbi.1008799.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/054e5a724988/pcbi.1008799.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/f2f39ba3a61f/pcbi.1008799.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/9382db68f272/pcbi.1008799.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/40672d9a8a46/pcbi.1008799.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7c6/7990174/054e5a724988/pcbi.1008799.g004.jpg

相似文献

1
DCMD: Distance-based classification using mixture distributions on microbiome data.DCMD:基于距离的微生物组数据混合分布分类方法。
PLoS Comput Biol. 2021 Mar 12;17(3):e1008799. doi: 10.1371/journal.pcbi.1008799. eCollection 2021 Mar.
2
Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.基于机器学习和数据库的方法在高通量测序数据分类中的应用与比较。
Genome Biol Evol. 2024 May 2;16(5). doi: 10.1093/gbe/evae102.
3
A distance-based approach for testing the mediation effect of the human microbiome.基于距离的方法检验人类微生物组的中介效应
Bioinformatics. 2018 Jun 1;34(11):1875-1883. doi: 10.1093/bioinformatics/bty014.
4
DeepMicro: deep representation learning for disease prediction based on microbiome data.深微:基于微生物组数据的疾病预测的深度学习表示。
Sci Rep. 2020 Apr 7;10(1):6026. doi: 10.1038/s41598-020-63159-5.
5
Microbiome Preprocessing Machine Learning Pipeline.微生物组预处理机器学习管道。
Front Immunol. 2021 Jun 18;12:677870. doi: 10.3389/fimmu.2021.677870. eCollection 2021.
6
Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model.人类微生物组测序数据聚类:基于距离的无监督学习模型。
Microorganisms. 2020 Oct 20;8(10):1612. doi: 10.3390/microorganisms8101612.
7
MLSeq: Machine learning interface for RNA-sequencing data.MLSeq:用于 RNA-seq 数据的机器学习接口。
Comput Methods Programs Biomed. 2019 Jul;175:223-231. doi: 10.1016/j.cmpb.2019.04.007. Epub 2019 Apr 29.
8
HARMONIES: A Hybrid Approach for Microbiome Networks Inference via Exploiting Sparsity.和谐:一种通过利用稀疏性进行微生物组网络推断的混合方法。
Front Genet. 2020 Jun 3;11:445. doi: 10.3389/fgene.2020.00445. eCollection 2020.
9
Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.大规模基准测试揭示了微生物组研究中使用的 16S rRNA 基因扩增子数据分析方法中的假发现和计数转换敏感性。
Microbiome. 2016 Nov 25;4(1):62. doi: 10.1186/s40168-016-0208-8.
10
MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification.MK-BMC:一种用于微生物组数据分类的多内核框架,具有增强的距离度量。
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad757.

引用本文的文献

1
COVID-19 heterogeneity in islands chain environment.岛屿链环境中的 COVID-19 异质性。
PLoS One. 2022 May 18;17(5):e0263866. doi: 10.1371/journal.pone.0263866. eCollection 2022.

本文引用的文献

1
Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks.微生物组学习资源库(ML Repo):一个公开的微生物组回归和分类任务资源库。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz042.
2
Prediction analysis for microbiome sequencing data.微生物组测序数据的预测分析。
Biometrics. 2019 Sep;75(3):875-884. doi: 10.1111/biom.13061. Epub 2019 Apr 17.
3
Analyzing differences between microbiome communities using mixture distributions.利用混合分布分析微生物群落之间的差异。
Stat Med. 2018 Nov 30;37(27):4036-4053. doi: 10.1002/sim.7896. Epub 2018 Jul 23.
4
Normalization and microbial differential abundance strategies depend upon data characteristics.归一化和微生物差异丰度策略取决于数据特征。
Microbiome. 2017 Mar 3;5(1):27. doi: 10.1186/s40168-017-0237-y.
5
Introduction to machine learning: k-nearest neighbors.机器学习导论:k-最近邻算法。
Ann Transl Med. 2016 Jun;4(11):218. doi: 10.21037/atm.2016.03.37.
6
Modulation of gut microbiota by berberine and metformin during the treatment of high-fat diet-induced obesity in rats.小檗碱和二甲双胍在治疗高脂饮食诱导的大鼠肥胖过程中对肠道微生物群的调节作用
Sci Rep. 2015 Sep 23;5:14405. doi: 10.1038/srep14405.
7
The gut microbiome in health and in disease.健康与疾病状态下的肠道微生物群
Curr Opin Gastroenterol. 2015 Jan;31(1):69-75. doi: 10.1097/MOG.0000000000000139.
8
The treatment-naive microbiome in new-onset Crohn's disease.初发克罗恩病的治疗初治微生物组。
Cell Host Microbe. 2014 Mar 12;15(3):382-392. doi: 10.1016/j.chom.2014.02.005.
9
Impact of technical sources of variation on the hand microbiome dynamics of healthcare workers.技术变异来源对医护人员手部微生物群落动态的影响。
PLoS One. 2014 Feb 14;9(2):e88999. doi: 10.1371/journal.pone.0088999. eCollection 2014.
10
A comprehensive evaluation of multicategory classification methods for microbiomic data.宏基因组数据多分类方法的综合评价。
Microbiome. 2013 Apr 5;1(1):11. doi: 10.1186/2049-2618-1-11.