Suppr超能文献

从454焦磷酸测序数据中识别并去除人工重复序列。

Identifying and removing artificial replicates from 454 pyrosequencing data.

作者信息

Teal Tracy K, Schmidt Thomas M

机构信息

Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Cold Spring Harb Protoc. 2010 Apr;2010(4):pdb.prot5409. doi: 10.1101/pdb.prot5409.

Abstract

An intrinsic artifact of 454-based pyrosequencing leads to artificial overrepresentation of >10% of the original DNA sequencing templates. This artificial amplification of sequences is unbiased with regard to position on the pyrosequencing plate or sequence identity, and it occurs in all currently available 454 technologies. The amplified sequences start at the same position and are identical (duplicates), or vary in length, or contain a sequencing discrepancy. If the abundance of any sequence in a data set is going to be enumerated, either for comparative community analysis, transcriptional analysis or other applications, it is important to remove these artificial replicates before analysis. A web-based tool that incorporates the clustering algorithm cd-hit was developed to identify and remove artificially replicated sequences in 454-based pyrosequencing data sets. This tool cannot be used for data sets that have an initial amplification step before the standard pyrosequencing procedure, because artificial replicates cannot be distinguished from expected replication due to polymerase chain reaction (PCR) amplification, e.g., in sequencing of amplified gene "tags." This protocol provides details on how to use the replicate filter and obtain a file of unique sequences for use in metagenomic or transcriptomic analyses.

摘要

基于454的焦磷酸测序的一种内在假象会导致超过10%的原始DNA测序模板出现人为的过度呈现。这种序列的人为扩增在焦磷酸测序板上的位置或序列同一性方面是无偏向性的,并且在所有当前可用的454技术中都会发生。扩增的序列从相同位置开始且是相同的(重复序列),或者长度不同,或者包含测序差异。如果要对数据集中任何序列的丰度进行计数,无论是用于比较群落分析、转录分析还是其他应用,在分析之前去除这些人为复制序列很重要。开发了一种基于网络的工具,该工具整合了聚类算法cd-hit,用于识别和去除基于454的焦磷酸测序数据集中的人为复制序列。该工具不能用于在标准焦磷酸测序程序之前有初始扩增步骤的数据集,因为由于聚合酶链反应(PCR)扩增,无法将人为复制序列与预期的复制区分开来,例如在扩增基因“标签”的测序中。本方案提供了有关如何使用重复序列过滤器以及获得用于宏基因组或转录组分析的唯一序列文件的详细信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验