• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VecScreen_plus_taxonomy:对载体污染筛查施加分类学税(onomy)增加。

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

机构信息

Department of Health and Human Services, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Bioinformatics. 2018 Mar 1;34(5):755-759. doi: 10.1093/bioinformatics/btx669.

DOI:10.1093/bioinformatics/btx669
PMID:29069347
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6030928/
Abstract

MOTIVATION

Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches.

RESULTS

A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions.

AVAILABILITY AND IMPLEMENTATION

Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy.

CONTACT

aschaffe@helix.nih.gov.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

公共数据库中的核酸序列不应包含载体污染,但 GenBank 中的许多序列(或曾经)包含载体。国家生物技术信息中心使用 VecScreen 程序筛选提交的序列以检测污染。需要额外的工具来区分真正的阳性(污染)和假阳性(未污染)VecScreen 匹配。

结果

假阳性 VecScreen 匹配的一个主要原因是序列和匹配的载体子序列来自密切相关或相同的生物体(例如,两者都来自大肠杆菌)。我们收集了 VecScreen 使用的 UniVec 数据库中载体片段来源的分类学信息。我们在两个重叠的软件管道中使用该信息对 GenBank 中的污染进行回顾性分析,并对新序列提交进行前瞻性分析。使用回顾性管道,我们在非冗余核苷酸数据库中识别并纠正了 8000 多个污染序列。前瞻性分析管道自 2017 年 4 月以来一直在生产中使用,以评估一些新的 GenBank 提交。

可用性和实现

UniVec 条目的来源数据包含在版本 10.0 中(ftp://ftp.ncbi.nih.gov/pub/UniVec/)。主要软件可在 https://github.com/aaschaffer/vecscreen_plus_taxonomy 上免费获得。

联系人

aschaffe@helix.nih.gov。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.VecScreen_plus_taxonomy:对载体污染筛查施加分类学税(onomy)增加。
Bioinformatics. 2018 Mar 1;34(5):755-759. doi: 10.1093/bioinformatics/btx669.
2
Using GenBank.使用GenBank。
Methods Mol Biol. 2007;406:23-59. doi: 10.1007/978-1-59745-535-0_2.
3
Automated download and clean-up of family-specific databases for kmer-based virus identification.基于 kmer 的病毒识别的家族特异性数据库的自动下载和清理。
Bioinformatics. 2021 May 5;37(5):705-710. doi: 10.1093/bioinformatics/btaa857.
4
ARBitrator: a software pipeline for on-demand retrieval of auto-curated nifH sequences from GenBank.ARBitrator:一个用于从 GenBank 按需检索自动策展 nifH 序列的软件管道。
Bioinformatics. 2014 Oct 15;30(20):2883-90. doi: 10.1093/bioinformatics/btu417. Epub 2014 Jul 2.
5
GenBank.GenBank
Nucleic Acids Res. 2014 Jan;42(Database issue):D32-7. doi: 10.1093/nar/gkt1030. Epub 2013 Nov 11.
6
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.
7
GenBank.基因银行
Nucleic Acids Res. 2009 Jan;37(Database issue):D26-31. doi: 10.1093/nar/gkn723. Epub 2008 Oct 21.
8
Using GenBank.使用基因库。
Methods Mol Biol. 2016;1374:1-22. doi: 10.1007/978-1-4939-3167-5_1.
9
GenBank: update.基因库:更新。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. doi: 10.1093/nar/gkh045.
10
DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.DFAST:一个灵活的原核生物基因组注释管道,用于更快地发布基因组。
Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713.

引用本文的文献

1
Identification and characterization of multiple novel viruses in fecal samples of cormorants.鸬鹚粪便样本中多种新型病毒的鉴定与特征分析
Front Vet Sci. 2025 Jan 9;11:1528233. doi: 10.3389/fvets.2024.1528233. eCollection 2024.
2
Identification of intracellular bacteria from multiple single-cell RNA-seq platforms using CSI-Microbes.使用 CSI-Microbes 从多个单细胞 RNA-seq 平台鉴定胞内细菌。
Sci Adv. 2024 Jul 5;10(27):eadj7402. doi: 10.1126/sciadv.adj7402. Epub 2024 Jul 3.
3
Ten common issues with reference sequence databases and how to mitigate them.参考序列数据库的十个常见问题及如何缓解这些问题。
Front Bioinform. 2024 Mar 15;4:1278228. doi: 10.3389/fbinf.2024.1278228. eCollection 2024.
4
Pest status, molecular evolution, and epigenetic factors derived from the genome assembly of Frankliniella fusca, a thysanopteran phytovirus vector.烟粉虱,缨翅目植物病毒介体,基于其基因组组装的虫害状况、分子进化和表观遗传因素。
BMC Genomics. 2023 Jun 22;24(1):343. doi: 10.1186/s12864-023-09375-5.
5
Rapid and sensitive detection of genome contamination at scale with FCS-GX.使用FCS-GX大规模快速灵敏地检测基因组污染。
bioRxiv. 2023 Jun 6:2023.06.02.543519. doi: 10.1101/2023.06.02.543519.
6
Extracellular Vesicles Secreted by Pre-Hatching Bovine Embryos Produced In Vitro and In Vivo Alter the Expression of IFNtau-Stimulated Genes in Bovine Endometrial Cells.体外和体内培养的牛胚胎孵化前分泌的细胞外囊泡改变了牛子宫内膜细胞中 IFNtau 刺激基因的表达。
Int J Mol Sci. 2023 Apr 18;24(8):7438. doi: 10.3390/ijms24087438.
7
A deep learning approach reveals unexplored landscape of viral expression in cancer.深度学习方法揭示了癌症中病毒表达的未知领域。
Nat Commun. 2023 Feb 11;14(1):785. doi: 10.1038/s41467-023-36336-z.
8
Elimination of Foreign Sequences in Eukaryotic Viral Reference Genomes Improves the Accuracy of Virome Analysis.真核病毒参考基因组中外源序列的消除可提高病毒组分析的准确性。
mSystems. 2022 Dec 20;7(6):e0090722. doi: 10.1128/msystems.00907-22. Epub 2022 Oct 26.
9
Identification of Antibiotic Resistance Proteins via MiCId's Augmented Workflow. A Mass Spectrometry-Based Proteomics Approach.通过 MiCId 增强工作流程鉴定抗生素耐药蛋白。一种基于质谱的蛋白质组学方法。
J Am Soc Mass Spectrom. 2022 Jun 1;33(6):917-931. doi: 10.1021/jasms.1c00347. Epub 2022 May 2.
10
HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly.HiFiAdapterFilt 是一种节省内存的读处理流水线,可以防止 PacBio HiFi 读中出现接头序列,并降低接头序列对基因组组装的负面影响。
BMC Genomics. 2022 Feb 22;23(1):157. doi: 10.1186/s12864-022-08375-1.

本文引用的文献

1
Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination.真空:由重组载体污染引起的假体变异的识别和过滤。
Bioinformatics. 2016 Oct 15;32(20):3072-3080. doi: 10.1093/bioinformatics/btw383. Epub 2016 Jun 22.
2
SeqPurge: highly-sensitive adapter trimming for paired-end NGS data.SeqPurge:用于双端NGS数据的高灵敏度接头修剪
BMC Bioinformatics. 2016 May 10;17:208. doi: 10.1186/s12859-016-1069-7.
3
PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm.PEAT:一种智能高效的双端测序接头修剪算法。
BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-16-S1-S2. Epub 2015 Jan 21.
4
Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads.Skewer:一种用于新一代测序双端读段的快速且准确的接头修剪工具。
BMC Bioinformatics. 2014 Jun 12;15:182. doi: 10.1186/1471-2105-15-182.
5
AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads.AlienTrimmer:一种快速准确地从高通量测序读取中修剪掉多个短污染序列的工具。
Genomics. 2013 Nov-Dec;102(5-6):500-6. doi: 10.1016/j.ygeno.2013.07.011. Epub 2013 Aug 1.
6
Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies.Btrim:一种快速、轻量级的适用于新一代测序技术的接头和质量修剪程序。
Genomics. 2011 Aug;98(2):152-3. doi: 10.1016/j.ygeno.2011.05.009. Epub 2011 May 30.
7
Fast identification and removal of sequence contamination from genomic and metagenomic datasets.快速识别和去除基因组和宏基因组数据集中的序列污染。
PLoS One. 2011 Mar 9;6(3):e17288. doi: 10.1371/journal.pone.0017288.
8
TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets.TagCleaner:从基因组和宏基因组数据集中识别和去除标签序列。
BMC Bioinformatics. 2010 Jun 23;11:341. doi: 10.1186/1471-2105-11-341.
9
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.SeqTrim:一种用于预处理任何类型序列读取的高通量管道。
BMC Bioinformatics. 2010 Jan 20;11:38. doi: 10.1186/1471-2105-11-38.
10
BLAST+: architecture and applications.BLAST+:体系结构与应用。
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.