Kers Jannigje Gerdien, Saccenti Edoardo
Laboratory of Microbiology, Wageningen University & Research, Wageningen, Netherlands.
Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, Netherlands.
Front Microbiol. 2022 Mar 3;12:796025. doi: 10.3389/fmicb.2021.796025. eCollection 2021.
Since sequencing techniques have become less expensive, larger sample sizes are applicable for microbiota studies. The aim of this study is to show how, and to what extent, different diversity metrics and different compositions of the microbiota influence the needed sample size to observe dissimilar groups. Empirical 16S rRNA amplicon sequence data obtained from animal experiments, observational human data, and simulated data were used to perform retrospective power calculations. A wide variation of alpha diversity and beta diversity metrics were used to compare the different microbiota datasets and the effect on the sample size.
Our data showed that beta diversity metrics are the most sensitive to observe differences as compared with alpha diversity metrics. The structure of the data influenced which alpha metrics are the most sensitive. Regarding beta diversity, the Bray-Curtis metric is in general the most sensitive to observe differences between groups, resulting in lower sample size and potential publication bias.
We recommend performing power calculations and to use multiple diversity metrics as an outcome measure. To improve microbiota studies, awareness needs to be raised on the sensitivity and bias for microbiota research outcomes created by the used metrics rather than biological differences. We have seen that different alpha and beta diversity metrics lead to different study power: because of this, one could be naturally tempted to try all possible metrics until one or more are found that give a statistically significant test result, i.e., -value < α. This way of proceeding is one of the many forms of the so-called -value hacking. To this end, in our opinion, the only way to protect ourselves from (the temptation of) -hacking would be to a statistical plan before experiments are initiated, describing the outcomes of interest and the corresponding statistical analyses to be performed.
由于测序技术成本降低,更大的样本量可用于微生物群研究。本研究的目的是展示不同的多样性指标以及微生物群的不同组成如何以及在多大程度上影响观察不同组所需的样本量。从动物实验、观察性人类数据和模拟数据中获得的经验性16S rRNA扩增子序列数据用于进行回顾性功效计算。使用多种α多样性和β多样性指标来比较不同的微生物群数据集及其对样本量的影响。
我们的数据表明,与α多样性指标相比,β多样性指标对观察差异最为敏感。数据结构影响哪些α指标最为敏感。关于β多样性,一般来说,Bray-Curtis指标对观察组间差异最为敏感,从而导致所需样本量更低以及可能存在发表偏倚。
我们建议进行功效计算并使用多种多样性指标作为结果测量。为了改进微生物群研究,需要提高对所用指标而非生物学差异所造成的微生物群研究结果的敏感性和偏倚的认识。我们已经看到不同的α和β多样性指标会导致不同的研究功效:因此,人们可能自然会倾向于尝试所有可能的指标,直到找到一个或多个能给出具有统计学意义的检验结果(即P值<α)的指标。这种做法是所谓P值操纵的多种形式之一。为此,我们认为,避免P值操纵(诱惑)的唯一方法是在实验开始前制定一个统计计划,描述感兴趣的结果以及要进行的相应统计分析。