Suppr超能文献

焦磷酸测序错误校正对生物数据解释的影响。

Implications of pyrosequencing error correction for biological data interpretation.

机构信息

Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota, USA.

出版信息

PLoS One. 2012;7(8):e44357. doi: 10.1371/journal.pone.0044357. Epub 2012 Aug 30.

Abstract

There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.

摘要

已经有许多方法可以处理和操作第二代 DNA 序列数据。然而,用户常常对处理方法的选择如何影响数据的生物学解释感到不确定。在本报告中,我们研究了两种不同处理管道之间的输出差异:一种是使用 AmpliconNoise 算法进行纠错的去噪方法,另一种是使用质量过滤和预聚类来减少错误的标准方法。尽管 AmpliconNoise 去除了更多的净读数,但每种方法都有很大一部分重叠的读数被剔除。一种方法产生的大多数 OTUs 在另一种方法中都有明显对应的伙伴。虽然每种方法产生的 OTUs 都完全由另一种方法剔除的读数组成,但标准管道中形成的这种 OTUs 要多得多。AmpliconNoise 处理降低了总 OTU 丰富度,但增加了每个样本的 OTU 丰富度、多样性和均匀度。增加每个样本的丰富度和多样性可能是 AmpliconNoise 处理产生更均匀的 OTU 等级丰度分布的结果。由于群落是随机抽样的,以使群落之间的样本大小相等,并且由于稀有序列变体在抽样过程中不太可能被选择,因此当对 AmpliconNoise 处理后的数据进行抽样时,每个群落中丢失的 OTU 较少。与基于分类群的多样性估计相比,即使在按样本计算的基础上,去噪也降低了系统发育多样性,并且样本的多样性排名变化很大。这项工作说明了处理管道对从焦磷酸测序调查中得出的生物学解释的重大影响。本研究为当代数据分析、必需的数据归档(处理后与非处理后数据)以及使用不同数据处理管道进行的研究之间的比较提供了重要的注意事项。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab05/3431371/e0ee5617fc16/pone.0044357.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验