Faculty of Infocommunication Technologies, ITMO University, St. Petersburg, Russia.
Laboratory of Microbiological Monitoring and Bioremediation of Soils, All-Russia Research Institute for Agricultural Microbiology, St. Petersburg, Russia.
PeerJ. 2022 Aug 30;10:e13888. doi: 10.7717/peerj.13888. eCollection 2022.
High-throughput sequencing of amplicon libraries is the most widespread and one of the most effective ways to study the taxonomic structure of microbial communities, even despite growing accessibility of whole metagenome sequencing. Due to the targeted amplification, the method provides unparalleled resolution of communities, but at the same time perturbs initial community structure thereby reducing data robustness and compromising downstream analyses. Experimental research of the perturbations is largely limited to comparative studies on different PCR protocols without considering other sources of experimental variation related to characteristics of the initial microbial composition itself. Here we analyse these sources and demonstrate how dramatically they effect the relative abundances of taxa during the PCR cycles. We developed the mathematical model of the PCR amplification assuming the heterogeneity of amplification efficiencies and considering the compositional nature of data. We designed the experiment-five consecutive amplicon cycles (22-26) with 12 replicates for one real human stool microbial sample-and estimated the dynamics of the microbial community in line with the model. We found the high heterogeneity in amplicon efficiencies of taxa that leads to the non-linear and substantial (up to fivefold) changes in relative abundances during PCR. The analysis of possible sources of heterogeneity revealed the significant association between amplicon efficiencies and the energy of secondary structures of the DNA templates. The result of our work highlights non-trivial changes in the dynamics of real-life microbial communities due to their compositional nature. Obtained effects are specific not only for amplicon libraries, but also for any studies of metagenome dynamics.
高通量扩增子文库测序是研究微生物群落分类结构最广泛和最有效的方法之一,即使全宏基因组测序的可及性不断提高。由于靶向扩增,该方法提供了无与伦比的群落分辨率,但同时也干扰了初始群落结构,从而降低了数据的稳健性,并影响了下游分析。对这些扰动的实验研究在很大程度上仅限于不同 PCR 方案的比较研究,而没有考虑与初始微生物组成本身特征相关的其他实验变异源。在这里,我们分析了这些来源,并展示了它们在 PCR 循环过程中如何极大地影响分类群的相对丰度。我们开发了一个假设扩增效率异质性并考虑数据组成性质的 PCR 扩增数学模型。我们设计了一个实验-对一个真实的人类粪便微生物样本进行五个连续的扩增子循环(22-26),每个循环有 12 个重复-并根据模型估计微生物群落的动态。我们发现分类群的扩增效率存在高度异质性,导致 PCR 过程中相对丰度的非线性和实质性(高达五倍)变化。对异质性可能来源的分析表明,扩增效率与 DNA 模板二级结构的能量之间存在显著关联。我们工作的结果强调了由于其组成性质,真实生活中的微生物群落动态会发生复杂的变化。所获得的影响不仅针对扩增子文库,而且针对任何宏基因组动态研究都是特异性的。