Lee Imchang, Chalita Mauricio, Ha Sung-Min, Na Seong-In, Yoon Seok-Hwan, Chun Jongsik
School of Biological Sciences & Institute of Molecular Biology & Genetics, Seoul National University, Seoul 151-742, Republic of Korea.
Inter-disciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea.
Int J Syst Evol Microbiol. 2017 Jun;67(6):2053-2057. doi: 10.1099/ijsem.0.001872. Epub 2017 Jun 22.
Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
得益于DNA测序技术的最新进展,原核生物基因组测序的成本和时间大幅降低。多次有报道称,使用高通量下一代测序进行基因组测序由于其高深度的测序覆盖而容易受到污染。尽管有一些生物信息学工具可用于检测潜在污染,但这些工具存在局限性,因为它们仅使用蛋白质编码基因。在此,我们引入一种名为ContEst16S的新算法,以利用基因组组装中的16S rRNA基因检测潜在污染。我们使用ContEst16S从NCBI组装数据库中筛选了69745个原核生物基因组,发现其中594个被细菌、人类和植物污染。在预测的受污染基因组中,8%未被现有的基于蛋白质编码基因的工具预测到,这意味着这两种方法在污染检测中可以相互补充。该算法的基于网络的服务可在www.ezbiocloud.net/tools/contest16s上获取。