Department of Computer Programming, Recep Tayyip Erdoğan University, Ardeşen Vocational School, Rize, 53400, Turkey.
Department of Computer Sciences, Applied Mathematics and Statistics, University of Girona, Campus Montilivi, 17003 Girona, Spain.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac328.
Statistical and machine learning techniques based on relative abundances have been used to predict health conditions and to identify microbial biomarkers. However, high dimensionality, sparsity and the compositional nature of microbiome data represent statistical challenges. On the other hand, the taxon grouping allows summarizing microbiome abundance with a coarser resolution in a lower dimension, but it presents new challenges when correlating taxa with a disease. In this work, we present a novel approach that groups Operational Taxonomical Units (OTUs) based only on relative abundances as an alternative to taxon grouping. The proposed procedure acknowledges the compositional data making use of principal balances. The identified groups are called Principal Microbial Groups (PMGs). The procedure reduces the need for user-defined aggregation of $\textrm{OTU}$s and offers the possibility of working with coarse group of $\textrm{OTU}$s, which are not present in a phylogenetic tree. PMGs can be used for two different goals: (1) as a dimensionality reduction method for compositional data, (2) as an aggregation procedure that provides an alternative to taxon grouping for construction of microbial balances afterward used for disease prediction. We illustrate the procedure with a cirrhosis study data. PMGs provide a coherent data analysis for the search of biomarkers in human microbiota. The source code and demo data for PMGs are available at: https://github.com/asliboyraz/PMGs.
基于相对丰度的统计和机器学习技术已被用于预测健康状况和识别微生物生物标志物。然而,微生物组数据的高维性、稀疏性和组成性质带来了统计挑战。另一方面,分类群分组允许以较低的维度用更粗糙的分辨率来总结微生物组的丰度,但在将分类群与疾病相关联时,它带来了新的挑战。在这项工作中,我们提出了一种新的方法,仅基于相对丰度对操作分类单元(OTUs)进行分组,作为分类群分组的替代方法。所提出的程序承认组成数据,利用主平衡。所识别的组称为主要微生物组(PMGs)。该程序减少了对用户定义的 OTU 聚合的需求,并提供了使用不在系统发育树中的较粗 OTU 组的可能性。PMGs 可用于两个不同的目的:(1)作为组成数据的降维方法,(2)作为聚合过程,为随后用于疾病预测的微生物平衡构建提供替代分类群分组的方法。我们用肝硬化研究数据来说明该程序。PMGs 为寻找人类微生物群中的生物标志物提供了一致的数据分析。PMGs 的源代码和演示数据可在:https://github.com/asliboyraz/PMGs 获得。