从454焦磷酸测序数据中识别并去除人工重复序列。

Identifying and removing artificial replicates from 454 pyrosequencing data.

作者信息

Teal Tracy K, Schmidt Thomas M

机构信息

Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Cold Spring Harb Protoc. 2010 Apr;2010(4):pdb.prot5409. doi: 10.1101/pdb.prot5409.

DOI:10.1101/pdb.prot5409

PMID:20360363

Abstract

An intrinsic artifact of 454-based pyrosequencing leads to artificial overrepresentation of >10% of the original DNA sequencing templates. This artificial amplification of sequences is unbiased with regard to position on the pyrosequencing plate or sequence identity, and it occurs in all currently available 454 technologies. The amplified sequences start at the same position and are identical (duplicates), or vary in length, or contain a sequencing discrepancy. If the abundance of any sequence in a data set is going to be enumerated, either for comparative community analysis, transcriptional analysis or other applications, it is important to remove these artificial replicates before analysis. A web-based tool that incorporates the clustering algorithm cd-hit was developed to identify and remove artificially replicated sequences in 454-based pyrosequencing data sets. This tool cannot be used for data sets that have an initial amplification step before the standard pyrosequencing procedure, because artificial replicates cannot be distinguished from expected replication due to polymerase chain reaction (PCR) amplification, e.g., in sequencing of amplified gene "tags." This protocol provides details on how to use the replicate filter and obtain a file of unique sequences for use in metagenomic or transcriptomic analyses.

摘要

基于454的焦磷酸测序的一种内在假象会导致超过10%的原始DNA测序模板出现人为的过度呈现。这种序列的人为扩增在焦磷酸测序板上的位置或序列同一性方面是无偏向性的，并且在所有当前可用的454技术中都会发生。扩增的序列从相同位置开始且是相同的（重复序列），或者长度不同，或者包含测序差异。如果要对数据集中任何序列的丰度进行计数，无论是用于比较群落分析、转录分析还是其他应用，在分析之前去除这些人为复制序列很重要。开发了一种基于网络的工具，该工具整合了聚类算法cd-hit，用于识别和去除基于454的焦磷酸测序数据集中的人为复制序列。该工具不能用于在标准焦磷酸测序程序之前有初始扩增步骤的数据集，因为由于聚合酶链反应（PCR）扩增，无法将人为复制序列与预期的复制区分开来，例如在扩增基因“标签”的测序中。本方案提供了有关如何使用重复序列过滤器以及获得用于宏基因组或转录组分析的唯一序列文件的详细信息。

相似文献

Identifying and removing artificial replicates from 454 pyrosequencing data.

Cold Spring Harb Protoc. 2010 Apr;2010(4):pdb.prot5409. doi: 10.1101/pdb.prot5409.

Artificial and natural duplicates in pyrosequencing reads of metagenomic data.

BMC Bioinformatics. 2010 Apr 13;11:187. doi: 10.1186/1471-2105-11-187.

methBLAST and methPrimerDB: web-tools for PCR based methylation analysis.

BMC Bioinformatics. 2006 Nov 9;7:496. doi: 10.1186/1471-2105-7-496.

QDD: a user-friendly program to select microsatellite markers and design primers from large sequencing projects.

Bioinformatics. 2010 Feb 1;26(3):403-4. doi: 10.1093/bioinformatics/btp670. Epub 2009 Dec 10.

hybseek: pathogen primer design tool for diagnostic multi-analyte assays.

Comput Methods Programs Biomed. 2009 May;94(2):152-60. doi: 10.1016/j.cmpb.2008.12.007. Epub 2009 Feb 6.

Shotgun metagenomics of biological stains using ultra-deep DNA sequencing.

Forensic Sci Int Genet. 2010 Jul;4(4):228-31. doi: 10.1016/j.fsigen.2009.10.001. Epub 2009 Oct 30.

Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics.

ISME J. 2010 Jul;4(7):896-907. doi: 10.1038/ismej.2010.18. Epub 2010 Mar 11.

Metagenomics: read length matters.

Appl Environ Microbiol. 2008 Mar;74(5):1453-63. doi: 10.1128/AEM.02181-07. Epub 2008 Jan 11.

Generation of a 3D indexed Petunia insertion database for reverse genetics.

Plant J. 2008 Jun;54(6):1105-14. doi: 10.1111/j.1365-313X.2008.03482.x. Epub 2008 Mar 13.

VIRS: A visual tool for identifying restriction sites in multiple DNA sequences.

Biotechnol Prog. 2009 Sep-Oct;25(5):1525-7. doi: 10.1002/btpr.259.

引用本文的文献

Complete Genome Sequence of Strain SS-5, a Magnetotactic Gammaproteobacterium Isolated from the Salton Sea, a Shallow, Saline, Endorheic Rift Lake Located on the San Andreas Fault in California.

Microbiol Resour Announc. 2021 Jan 7;10(1):e00928-20. doi: 10.1128/MRA.00928-20.

Metagenomics: Retrospect and Prospects in High Throughput Age.

Biotechnol Res Int. 2015;2015:121735. doi: 10.1155/2015/121735. Epub 2015 Nov 17.

Comparative Metagenomics of Eight Geographically Remote Terrestrial Hot Springs.

Microb Ecol. 2015 Aug;70(2):411-24. doi: 10.1007/s00248-015-0576-9. Epub 2015 Feb 25.

A metagenomic framework for the study of airborne microbial communities.

PLoS One. 2013 Dec 11;8(12):e81862. doi: 10.1371/journal.pone.0081862. eCollection 2013.

Changes in diversity, abundance, and structure of soil bacterial communities in Brazilian Savanna under different land use systems.

Microb Ecol. 2013 Oct;66(3):593-607. doi: 10.1007/s00248-013-0235-y. Epub 2013 Apr 27.

Transcriptomic analysis of metabolic function in the giant kelp, Macrocystis pyrifera, across depth and season.

New Phytol. 2013 Apr;198(2):398-407. doi: 10.1111/nph.12160. Epub 2013 Mar 13.

Filtering duplicate reads from 454 pyrosequencing data.

Bioinformatics. 2013 Apr 1;29(7):830-6. doi: 10.1093/bioinformatics/btt047. Epub 2013 Feb 1.

Human gut microbiome viewed across age and geography.

Nature. 2012 May 9;486(7402):222-7. doi: 10.1038/nature11053.

Metagenomics - a guide from sampling to data analysis.

Microb Inform Exp. 2012 Feb 9;2(1):3. doi: 10.1186/2042-5783-2-3.

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence.

BMC Genomics. 2011 Jan 25;12:59. doi: 10.1186/1471-2164-12-59.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从454焦磷酸测序数据中识别并去除人工重复序列。

Identifying and removing artificial replicates from 454 pyrosequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献