National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Genome Biol. 2024 Feb 26;25(1):60. doi: 10.1186/s13059-024-03198-7.
Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .
组装基因组序列正在以指数级的速度产生。在这里,我们展示了 FCS-GX,它是 NCBI 的外来污染筛选 (FCS) 工具套件的一部分,经过优化可用于识别和去除新基因组中的污染序列。FCS-GX 可以在 0.1-10 分钟内筛选大多数基因组。在人工碎片化的基因组上测试 FCS-GX 表明,它对各种污染物种具有很高的灵敏度和特异性。我们使用 FCS-GX 筛选了 160 万个 GenBank 组装体,发现了 3680 亿 bp 的污染,占总碱基的 0.16%,其中一半来自 161 个组装体。我们更新了 NCBI RefSeq 中的组装体,将检测到的污染减少到碱基的 0.01%。FCS-GX 可在 https://github.com/ncbi/fcs/ 或 https://doi.org/10.5281/zenodo.10651084 获得。