Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, 94305, USA.
Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA.
Microbiome. 2018 Dec 17;6(1):226. doi: 10.1186/s40168-018-0605-2.
The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants-DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam ( https://github.com/benjjneb/decontam ), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls.
Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants.
Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.
基于标记基因和宏基因组测序(MGS)的微生物群落调查的准确性受到污染物的影响——即并非真实存在于样本中的 DNA 序列。污染物来自各种来源,包括试剂。适当的实验室操作可以减少污染,但不能完全消除它。在这里,我们介绍了 decontam(https://github.com/benjjneb/decontam),这是一个开源的 R 包,它实现了一种统计分类程序,可以根据两个广泛复制的模式识别 MGS 数据中的污染物:污染物在低浓度样本中出现的频率更高,并且经常在阴性对照中发现。
decontam 对人类口腔数据集的扩增子序列变异(ASVs)进行了分类,这与该环境中微生物类群的先前微观观察以及污染物类群的先前报告一致。在一系列稀释的宏基因组学和标记基因测量中,decontam 大大减少了不同测序方案产生的技术变异。decontam 在最近发表的两个数据集上的应用证实并扩展了他们的结论,即几乎没有证据表明胎盘中存在固有微生物组,一些与早产似乎相关的低频率类群是污染物。
decontam 通过识别和去除污染物 DNA 序列,提高了宏基因组和标记基因测序的质量。decontam 易于与现有的 MGS 工作流程集成,并允许研究人员以很少或不增加额外成本生成更准确的微生物群落图谱。