Mäklin Tommi, Kallonen Teemu, David Sophia, Boinett Christine J, Pascoe Ben, Méric Guillaume, Aanensen David M, Feil Edward J, Baker Stephen, Parkhill Julian, Sheppard Samuel K, Corander Jukka, Honkela Antti
Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
Department of Biostatistics, University of Oslo, Oslo, Norway.
Wellcome Open Res. 2021 Oct 8;5:14. doi: 10.12688/wellcomeopenres.15639.2. eCollection 2020.
Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
由于代表近亲的基因组之间存在相当大的重叠,确定细菌群落超出属或种水平的组成具有挑战性。在这里,我们提出了mSWEEP流程,用于从富集培养物的平板扫描中识别和估计细菌谱系的相对序列丰度。mSWEEP利用生物分组的序列组装数据库,应用概率建模,并为假阳性结果提供控制。使用来自主要病原体的测序数据,我们证明了谱系定量和检测准确性的显著提高。我们的流程有助于研究包含细菌混合物的培养物,并开辟了平板扫描宏基因组学的新领域。