Suppr超能文献

基于泊松 hurdle 模型的微生物组特征聚类方法。

Poisson hurdle model-based method for clustering microbiome features.

机构信息

Department of Statistics, Iowa State University, Ames, IA 50011, USA.

Department of Energy, Joint Genome Institute, Berkeley, CA 94720, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac782.

Abstract

MOTIVATION

High-throughput sequencing technologies have greatly facilitated microbiome research and have generated a large volume of microbiome data with the potential to answer key questions regarding microbiome assembly, structure and function. Cluster analysis aims to group features that behave similarly across treatments, and such grouping helps to highlight the functional relationships among features and may provide biological insights into microbiome networks. However, clustering microbiome data are challenging due to the sparsity and high dimensionality.

RESULTS

We propose a model-based clustering method based on Poisson hurdle models for sparse microbiome count data. We describe an expectation-maximization algorithm and a modified version using simulated annealing to conduct the cluster analysis. Moreover, we provide algorithms for initialization and choosing the number of clusters. Simulation results demonstrate that our proposed methods provide better clustering results than alternative methods under a variety of settings. We also apply the proposed method to a sorghum rhizosphere microbiome dataset that results in interesting biological findings.

AVAILABILITY AND IMPLEMENTATION

R package is freely available for download at https://cran.r-project.org/package=PHclust.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序技术极大地促进了微生物组研究,并产生了大量具有回答关于微生物组组装、结构和功能的关键问题潜力的微生物组数据。聚类分析旨在对在不同处理中表现相似的特征进行分组,这种分组有助于突出特征之间的功能关系,并可能为微生物组网络提供生物学见解。然而,由于稀疏性和高维性,聚类微生物组数据具有挑战性。

结果

我们提出了一种基于泊松障碍模型的基于模型的聚类方法,用于稀疏微生物计数数据。我们描述了一种期望最大化算法和一种使用模拟退火的修改版本来进行聚类分析。此外,我们还提供了初始化和选择聚类数量的算法。模拟结果表明,在各种设置下,我们提出的方法比替代方法提供了更好的聚类结果。我们还将所提出的方法应用于高粱根际微生物组数据集,得到了有趣的生物学发现。

可用性和实现

R 包可在 https://cran.r-project.org/package=PHclust 上免费下载。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c92/9825753/9cef94e80b92/btac782f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验