• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TagCleaner:从基因组和宏基因组数据集中识别和去除标签序列。

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets.

机构信息

Department of Computer Science, San Diego State University, CA, USA.

出版信息

BMC Bioinformatics. 2010 Jun 23;11:341. doi: 10.1186/1471-2105-11-341.

DOI:10.1186/1471-2105-11-341
PMID:20573248
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910026/
Abstract

BACKGROUND

Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data.

RESULTS

TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences.

CONCLUSIONS

TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner.

摘要

背景

使用基于引物的方法对宏基因组进行预扩增后,需要从数据集中去除额外的标签序列。由于测序的限制,测序reads 可能包含缺失或插入,并且引物序列可能包含模糊碱基。此外,标签序列可能不可用或报告不正确。由于下游序列污染引入的不准确的潜在可能性,使用可靠的工具对序列数据进行预处理非常重要。

结果

TagCleaner 是一个开发的网络应用程序,用于自动识别和去除已知或未知的标签序列,同时允许数据集插入和缺失。TagCleaner 旨在过滤修剪后的 reads 中的重复、短 reads 和高模糊序列率的 reads。额外的片段到片段拼接的筛选和拆分,这些拼接产生了人为拼接序列,可以提高数据集的质量。用户可以根据自己的喜好修改不同的过滤参数。

结论

TagCleaner 是一个公开可用的网络应用程序,能够自动检测和从宏基因组数据集中有效地去除标签序列。它易于配置,并提供了用户友好的界面。交互式网络界面方便了后续数据处理的导出功能,可在 http://edwards.sdsu.edu/tagcleaner 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/dd6299763b62/1471-2105-11-341-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/1e704cadf5ac/1471-2105-11-341-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/c0cd92af7910/1471-2105-11-341-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/252d9494fbac/1471-2105-11-341-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/e088eecf39ec/1471-2105-11-341-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/dd6299763b62/1471-2105-11-341-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/1e704cadf5ac/1471-2105-11-341-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/c0cd92af7910/1471-2105-11-341-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/252d9494fbac/1471-2105-11-341-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/e088eecf39ec/1471-2105-11-341-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/578f/2910026/dd6299763b62/1471-2105-11-341-5.jpg

相似文献

1
TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets.TagCleaner:从基因组和宏基因组数据集中识别和去除标签序列。
BMC Bioinformatics. 2010 Jun 23;11:341. doi: 10.1186/1471-2105-11-341.
2
Fast identification and removal of sequence contamination from genomic and metagenomic datasets.快速识别和去除基因组和宏基因组数据集中的序列污染。
PLoS One. 2011 Mar 9;6(3):e17288. doi: 10.1371/journal.pone.0017288.
3
Artificial and natural duplicates in pyrosequencing reads of metagenomic data.元基因组数据焦磷酸测序reads 中的人工和天然重复。
BMC Bioinformatics. 2010 Apr 13;11:187. doi: 10.1186/1471-2105-11-187.
4
COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器:宏基因组数据集功能注释框架
PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.
5
BatchPrimer3: a high throughput web application for PCR and sequencing primer design.BatchPrimer3:一款用于PCR和测序引物设计的高通量网络应用程序。
BMC Bioinformatics. 2008 May 29;9:253. doi: 10.1186/1471-2105-9-253.
6
Selection of marker genes for genetic barcoding of microorganisms and binning of metagenomic reads by Barcoder software tools.微生物遗传条形码标记基因的选择和 Barcoder 软件工具对宏基因组读段的分类。
BMC Bioinformatics. 2018 Aug 30;19(1):309. doi: 10.1186/s12859-018-2320-1.
7
Orphelia: predicting genes in metagenomic sequencing reads.奥菲莉亚:宏基因组测序读段中的基因预测
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W101-5. doi: 10.1093/nar/gkp327. Epub 2009 May 8.
8
Quality control and preprocessing of metagenomic datasets.宏基因组数据集的质量控制和预处理。
Bioinformatics. 2011 Mar 15;27(6):863-4. doi: 10.1093/bioinformatics/btr026. Epub 2011 Jan 28.
9
WebMGA: a customizable web server for fast metagenomic sequence analysis.WebMGA:一个可定制的快速宏基因组序列分析网络服务器。
BMC Genomics. 2011 Sep 7;12:444. doi: 10.1186/1471-2164-12-444.
10
MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.MBMC:一种用于对环境鸟枪法测序项目中的宏基因组读数进行分箱的有效马尔可夫链方法。
OMICS. 2016 Aug;20(8):470-9. doi: 10.1089/omi.2016.0081. Epub 2016 Jul 22.

引用本文的文献

1
Restoration of the human skin microbiome following immune recovery after hematopoietic stem cell transplantation.造血干细胞移植后免疫恢复过程中人类皮肤微生物群的恢复
Cell Host Microbe. 2025 Jul 24. doi: 10.1016/j.chom.2025.07.002.
2
Influence of wet and dry commercial diets on the oral microbiota of Yorkshire terriers.干湿商品粮对约克夏犬口腔微生物群的影响。
BMC Vet Res. 2025 Apr 26;21(1):290. doi: 10.1186/s12917-025-04533-1.
3
DNA metabarcoding analysis revealed a silent prevalence of environmental pathogenic in urban area of Okinawa Island, Japan.

本文引用的文献

1
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.SeqTrim:一种用于预处理任何类型序列读取的高通量管道。
BMC Bioinformatics. 2010 Jan 20;11:38. doi: 10.1186/1471-2105-11-38.
2
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.Sanger 测序的 FASTQ 文件格式,用于包含质量分数的序列,以及 Solexa/Illumina FASTQ 变体。
Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.
3
Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.
DNA宏条形码分析揭示了日本冲绳岛市区环境病原体的隐性流行情况。
One Health. 2025 Mar 18;20:101016. doi: 10.1016/j.onehlt.2025.101016. eCollection 2025 Jun.
4
The blood metabolome of cognitive function and brain health in middle-aged adults - influences of genes, gut microbiome, and exposome.中年成年人认知功能和大脑健康的血液代谢组学——基因、肠道微生物群和暴露组的影响
medRxiv. 2024 Dec 16:2024.12.16.24317793. doi: 10.1101/2024.12.16.24317793.
5
Structure and composition of early biofilms formed on dental implants are complex, diverse, subject-specific and dynamic.牙种植体上早期生物膜的结构和组成复杂、多样、因个体而异且具有动态性。
NPJ Biofilms Microbiomes. 2024 Dec 24;10(1):155. doi: 10.1038/s41522-024-00624-3.
6
A systematic survey of environmental DNA in Palau's lakes and waterfalls reveals an increase in levels after flooding.对帕劳湖泊和瀑布中环境DNA的系统调查显示,洪水过后其水平有所上升。
One Health. 2024 Sep 18;19:100898. doi: 10.1016/j.onehlt.2024.100898. eCollection 2024 Dec.
7
SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota.SpeciateIT 和 vSpeciateDB:一种新型、快速且准确的基于 16S rRNA 基因序列的阴道微生物群落分类方法。
BMC Bioinformatics. 2024 Sep 27;25(1):313. doi: 10.1186/s12859-024-05930-3.
8
Improved sub-genomic RNA prediction with the ARTIC protocol.利用 ARTIC 协议提高亚基因组 RNA 预测。
Nucleic Acids Res. 2024 Sep 23;52(17):e82. doi: 10.1093/nar/gkae687.
9
SpeciateIT and vSpeciateDB: Novel, fast and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota.SpeciateIT和vSpeciateDB:用于阴道微生物群16S rRNA基因按序列进行新颖、快速且准确的分类学分类方法
bioRxiv. 2024 Apr 22:2024.04.18.590089. doi: 10.1101/2024.04.18.590089.
10
Conserved signatures of the canine faecal microbiome are associated with metronidazole treatment and recovery.犬粪便微生物组的保守特征与甲硝唑治疗和恢复有关。
Sci Rep. 2024 Mar 4;14(1):5277. doi: 10.1038/s41598-024-51338-7.
介绍 mothur:开源、独立于平台、社区支持的软件,用于描述和比较微生物群落。
Appl Environ Microbiol. 2009 Dec;75(23):7537-41. doi: 10.1128/AEM.01541-09. Epub 2009 Oct 2.
4
Metagenomic analysis of RNA viruses in a fresh water lake.宏基因组分析淡水湖中 RNA 病毒。
PLoS One. 2009 Sep 29;4(9):e7264. doi: 10.1371/journal.pone.0007264.
5
De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data.利用 Sanger、454 和 Illumina 测序数据进行丝状真菌从头基因组序列组装。
Genome Biol. 2009;10(9):R94. doi: 10.1186/gb-2009-10-9-r94. Epub 2009 Sep 11.
6
Accurate determination of microbial diversity from 454 pyrosequencing data.从454焦磷酸测序数据中准确测定微生物多样性。
Nat Methods. 2009 Sep;6(9):639-41. doi: 10.1038/nmeth.1361. Epub 2009 Aug 9.
7
Laboratory procedures to generate viral metagenomes.生成病毒宏基因组的实验室程序。
Nat Protoc. 2009;4(4):470-83. doi: 10.1038/nprot.2009.10.
8
Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach.使用无偏差高通量测序方法直接对鼻腔和粪便样本中的病毒病原体进行宏基因组检测。
PLoS One. 2009;4(1):e4219. doi: 10.1371/journal.pone.0004219. Epub 2009 Jan 19.
9
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis.核糖体数据库项目:改进的比对方法及用于rRNA分析的新工具。
Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. doi: 10.1093/nar/gkn879. Epub 2008 Nov 12.
10
Functional metagenomic profiling of nine biomes.九个生物群落的功能宏基因组分析
Nature. 2008 Apr 3;452(7187):629-32. doi: 10.1038/nature06810. Epub 2008 Mar 12.