Division of Infectious Diseases, Department of Medicine 1, Medical University of Vienna, Vienna, Austria.
University of Veterinary Medicine, Department of Virology, Vienna, Austria.
Sci Rep. 2019 Feb 7;9(1):1652. doi: 10.1038/s41598-019-38733-1.
Background noise in metagenomic studies is often of high importance and its removal requires extensive post-analytic, bioinformatics filtering. This is relevant as significant signals may be lost due to a low signal-to-noise ratio. The presence of plasmid residues, that are frequently present in reagents as contaminants, has not been investigated so far, but may pose a substantial bias. Here we show that plasmid sequences from different sources are omnipresent in molecular biology reagents. Using a metagenomic approach, we identified the presence of the (pol) of equine infectious anemia virus in human samples and traced it back to the expression plasmid used for generation of a commercial reverse transcriptase. We found fragments of multiple other expression plasmids in human samples as well as commercial polymerase preparations. Plasmid contamination sources included production chain of molecular biology reagents as well as contamination of reagents from environment or human handling of samples and reagents. Retrospective analyses of published metagenomic studies revealed an inaccurate signal-to-noise differentiation. Hence, the plasmid sequences that seem to be omnipresent in molecular biology reagents may misguide conclusions derived from genomic/metagenomics datasets and thus also clinical interpretations. Critical appraisal of metagenomic data sets for the possibility of plasmid background noise is required to identify reliable and significant signals.
背景噪声在宏基因组研究中通常非常重要,需要进行广泛的分析后生物信息学过滤来去除。这是因为由于信噪比低,可能会丢失重要信号。目前尚未研究质粒残留物的存在情况,而质粒残留物经常作为污染物存在于试剂中,但可能会造成很大的偏差。在这里,我们展示了不同来源的质粒序列普遍存在于分子生物学试剂中。我们使用宏基因组学方法,在人类样本中鉴定出了来自不同来源的(pol)马传染性贫血病毒序列,并追溯到用于生成商业逆转录酶的表达质粒。我们还在人类样本以及商业聚合酶制剂中发现了多个其他表达质粒的片段。质粒污染来源包括分子生物学试剂的生产链,以及环境或人类处理样本和试剂时的污染。对已发表的宏基因组研究的回顾性分析表明,信号与噪声的区分不准确。因此,似乎普遍存在于分子生物学试剂中的质粒序列可能会误导从基因组/宏基因组数据集中得出的结论,从而也会误导临床解释。需要对宏基因组数据集进行质粒背景噪声的可能性进行批判性评估,以识别可靠和有意义的信号。