一种用于估计公共数据库中16S rRNA序列测序错误的简单二项式检验。

A simple binomial test for estimating sequencing errors in public repository 16S rRNA sequences.

作者信息

Zo Young-Gun, Colwell Rita R

机构信息

Center of Marine Biotechnology, University of Maryland Biotechnology Institute, 701 E. Pratt Street, Baltimore, MD 21202, USA,

出版信息

J Microbiol Methods. 2008 Feb;72(2):166-79. doi: 10.1016/j.mimet.2007.11.013. Epub 2007 Nov 23.

DOI:10.1016/j.mimet.2007.11.013

PMID:18155790

Abstract

Sequences in public databases may contain a number of sequencing errors. A double binomial model describing the distribution of indel-excluded similarity coefficients (S) among repeatedly sequenced 16S rRNA was previously developed and it produced a confidence interval of S useful for testing sequence identity among sequences of 400-bp length. We characterized patterns in sequencing errors found in nearly complete 16S rRNA sequences of Vibrionaceae as highly variable in reported sequence length and containing a small number of indels. To accommodate these characteristics, a simple binomial model for distribution of the similarity coefficient (H) that included indels was derived from the double binomial model for S. The model showed good fit to empirical data. By using either a pre-determined or bootstrapping estimated standard probability of base matching, we were able to use the exact binomial test to determine the relative level of sequencing error for a given pair of duplicated sequences. A limitation of the method is the requirement that duplicated sequences for the same template sequence be paired, but this can be overcome by using only conserved regions of 16S rRNA sequences and pairing a given sequence with its highest scoring BLAST search hit from the nr database of GenBank.

摘要

公共数据库中的序列可能包含一些测序错误。先前已开发出一种双二项式模型，用于描述重复测序的16S rRNA中插入缺失排除相似性系数（S）的分布，该模型产生了一个S的置信区间，可用于测试400 bp长度序列之间的序列同一性。我们将弧菌科几乎完整的16S rRNA序列中发现的测序错误模式表征为报告的序列长度高度可变且包含少量插入缺失。为适应这些特征，从S的双二项式模型推导出了一个包含插入缺失的相似性系数（H）分布的简单二项式模型。该模型与经验数据拟合良好。通过使用预先确定的或自展估计的碱基匹配标准概率，我们能够使用精确二项式检验来确定给定一对重复序列的相对测序错误水平。该方法的一个局限性是需要将相同模板序列的重复序列配对，但这可以通过仅使用16S rRNA序列的保守区域，并将给定序列与其在GenBank的nr数据库中得分最高的BLAST搜索命中序列配对来克服。

相似文献

A simple binomial test for estimating sequencing errors in public repository 16S rRNA sequences.一种用于估计公共数据库中16S rRNA序列测序错误的简单二项式检验。

J Microbiol Methods. 2008 Feb;72(2):166-79. doi: 10.1016/j.mimet.2007.11.013. Epub 2007 Nov 23.

Confidence intervals of similarity values determined for cloned SSU rRNA genes from environmental samples.从环境样本中克隆的小亚基核糖体RNA（SSU rRNA）基因所确定的相似性值的置信区间。

J Microbiol Methods. 2006 Apr;65(1):144-52. doi: 10.1016/j.mimet.2005.07.001. Epub 2005 Aug 3.

Development and evaluation of a quality-controlled ribosomal sequence database for 16S ribosomal DNA-based identification of Staphylococcus species.用于基于16S核糖体DNA鉴定葡萄球菌属的质量控制核糖体序列数据库的开发与评估

J Clin Microbiol. 2004 Nov;42(11):4988-95. doi: 10.1128/JCM.42.11.4988-4995.2004.

[16S rRNA gene sequencing for pathogen identification from clinical specimens].[用于从临床标本中鉴定病原体的16S rRNA基因测序]

Zhonghua Yi Xue Za Zhi. 2008 Jan 8;88(2):123-6.

Quantitatively evaluating mistaken clone assignments by RFLP analysis of 16S rRNA genes: a case study.通过16S rRNA基因的限制性片段长度多态性分析对错误的克隆分配进行定量评估：一个案例研究

Can J Microbiol. 2008 Jun;54(6):479-82. doi: 10.1139/w08-031.

Classification of the taxon 2 and taxon 3 complex of Bisgaard within Gallibacterium and description of Gallibacterium melopsittaci sp. nov., Gallibacterium trehalosifermentans sp. nov. and Gallibacterium salpingitidis sp. nov.加里巴氏菌属内Bisgaard分类单元2和分类单元3复合体的分类及鹦鹉加里巴氏菌新种、发酵海藻糖加里巴氏菌新种和输卵管加里巴氏菌新种的描述

Int J Syst Evol Microbiol. 2009 Apr;59(Pt 4):735-44. doi: 10.1099/ijs.0.005694-0.

Scratching the surface of the rare biosphere with ribosomal sequence tag primers.用核糖体序列标签引物探究稀有生物圈的表面。

FEMS Microbiol Lett. 2008 Jun;283(2):146-53. doi: 10.1111/j.1574-6968.2008.01124.x. Epub 2008 Apr 21.

Ribonuclease P RNA gene sequencing as a tool for molecular dereplication of myxobacterial strain collections.核糖核酸酶P RNA基因测序作为一种用于粘细菌菌株库分子去重复化的工具。

Lett Appl Microbiol. 2008 Jan;46(1):87-94. doi: 10.1111/j.1472-765X.2007.02271.x. Epub 2007 Oct 27.

Description of Enterovibrio nigricans sp. nov., reclassification of Vibrio calviensis as Enterovibrio calviensis comb. nov. and emended description of the genus Enterovibrio Thompson et al. 2002.黑色肠弧菌新种的描述、将卡尔维弧菌重新分类为卡尔维肠弧菌新组合以及对汤普森等人于2002年提出的肠弧菌属的修订描述。

Int J Syst Evol Microbiol. 2009 Apr;59(Pt 4):698-704. doi: 10.1099/ijs.0.001990-0.

Chitinolytic bacteria in the intestinal tract of Japanese coastal fishes.日本沿海鱼类肠道中的几丁质分解细菌。

Can J Microbiol. 2006 Dec;52(12):1158-63. doi: 10.1139/w06-082.

引用本文的文献

RNA colony blot hybridization method for enumeration of culturable Vibrio cholerae and Vibrio mimicus bacteria.用于计数可培养霍乱弧菌和拟态弧菌细菌的RNA集落印迹杂交法。

Appl Environ Microbiol. 2009 Sep;75(17):5439-44. doi: 10.1128/AEM.02007-08. Epub 2009 Jun 26.

Covariability of Vibrio cholerae microdiversity and environmental parameters.霍乱弧菌微多样性与环境参数的协变性。

Appl Environ Microbiol. 2008 May;74(9):2915-20. doi: 10.1128/AEM.02139-07. Epub 2008 Feb 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于估计公共数据库中16S rRNA序列测序错误的简单二项式检验。

A simple binomial test for estimating sequencing errors in public repository 16S rRNA sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献