Child Development and Behavior Center, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.
School of Mathematics, Sun Yat-sen University, Guangzhou, China.
PLoS One. 2021 Feb 18;16(2):e0246893. doi: 10.1371/journal.pone.0246893. eCollection 2021.
The heterogeneity of disease is a major concern in medical research and is commonly characterized as subtypes with different pathogeneses exhibiting distinct prognoses and treatment effects. The classification of a population into homogeneous subgroups is challenging, especially for complex diseases. Recent studies show that gut microbiome compositions play a vital role in disease development, and it is of great interest to cluster patients according to their microbial profiles. There are a variety of beta diversity measures to quantify the dissimilarity between the compositions of different samples for clustering. However, using different beta diversity measures results in different clusters, and it is difficult to make a choice among them. Considering microbial compositions from 16S rRNA sequencing, which are presented as a high-dimensional vector with a large proportion of extremely small or even zero-valued elements, we set up three simulation experiments to mimic the microbial compositional data and evaluate the performance of different beta diversity measures in clustering. It is shown that the Kullback-Leibler divergence-based beta diversity, including the Jensen-Shannon divergence and its square root, and the hypersphere-based beta diversity, including the Bhattacharyya and Hellinger, can capture compositional changes in low-abundance elements more efficiently and can work stably. Their performance on two real datasets demonstrates the validity of the simulation experiments.
疾病的异质性是医学研究中的一个主要关注点,通常表现为具有不同发病机制的亚型,其预后和治疗效果明显不同。将人群分为同质的亚组是具有挑战性的,特别是对于复杂疾病。最近的研究表明,肠道微生物组的组成在疾病的发展中起着至关重要的作用,根据微生物的特征对患者进行聚类是很有意义的。有多种β多样性测度方法可用于量化不同样本组成之间的差异,以进行聚类。然而,使用不同的β多样性测度会导致不同的聚类结果,因此很难在它们之间做出选择。考虑到 16S rRNA 测序的微生物组成,它呈现为一个高维向量,其中包含大量极小甚至为零值的元素,我们设置了三个模拟实验来模拟微生物组成数据,并评估不同β多样性测度在聚类中的性能。结果表明,基于 Kullback-Leibler 散度的β多样性,包括 Jensen-Shannon 散度及其平方根,以及基于超球的β多样性,包括 Bhattacharyya 散度和 Hellinger 散度,能够更有效地捕捉低丰度元素的组成变化,并且能够稳定工作。它们在两个真实数据集上的表现验证了模拟实验的有效性。