Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, Wisconsin, USA.
Appl Environ Microbiol. 2012 Feb;78(3):717-25. doi: 10.1128/AEM.06516-11. Epub 2011 Nov 18.
DECIPHER is a new method for finding 16S rRNA chimeric sequences by the use of a search-based approach. The method is based upon detecting short fragments that are uncommon in the phylogenetic group where a query sequence is classified but frequently found in another phylogenetic group. The algorithm was calibrated for full sequences (fs_DECIPHER) and short sequences (ss_DECIPHER) and benchmarked against WigeoN (Pintail), ChimeraSlayer, and Uchime using artificially generated chimeras. Overall, ss_DECIPHER and Uchime provided the highest chimera detection for sequences 100 to 600 nucleotides long (79% and 81%, respectively), but Uchime's performance deteriorated for longer sequences, while ss_DECIPHER maintained a high detection rate (89%). Both methods had low false-positive rates (1.3% and 1.6%). The more conservative fs_DECIPHER, benchmarked only for sequences longer than 600 nucleotides, had an overall detection rate lower than that of ss_DECIPHER (75%) but higher than those of the other programs. In addition, fs_DECIPHER had the lowest false-positive rate among all the benchmarked programs (<0.20%). DECIPHER was outperformed only by ChimeraSlayer and Uchime when chimeras were formed from closely related parents (less than 10% divergence). Given the differences in the programs, it was possible to detect over 89% of all chimeras with just the combination of ss_DECIPHER and Uchime. Using fs_DECIPHER, we detected between 1% and 2% additional chimeras in the RDP, SILVA, and Greengenes databases from which chimeras had already been removed with Pintail or Bellerophon. DECIPHER was implemented in the R programming language and is directly accessible through a webpage or by downloading the program as an R package (http://DECIPHER.cee.wisc.edu).
DECIPHER 是一种通过基于搜索的方法查找 16S rRNA 嵌合序列的新方法。该方法基于检测在查询序列分类的系统发育组中不常见但在另一个系统发育组中经常发现的短片段。该算法针对全长序列 (fs_DECIPHER) 和短序列 (ss_DECIPHER) 进行了校准,并使用人工生成的嵌合体与 WigeoN (Pintail)、ChimeraSlayer 和 Uchime 进行了基准测试。总体而言,ss_DECIPHER 和 Uchime 对 100 到 600 个核苷酸长的序列提供了最高的嵌合体检测率 (分别为 79%和 81%),但 Uchime 的性能随序列长度的增加而恶化,而 ss_DECIPHER 保持了较高的检测率 (89%)。这两种方法的假阳性率都很低 (1.3%和 1.6%)。更保守的 fs_DECIPHER 仅针对长度大于 600 个核苷酸的序列进行了基准测试,其整体检测率低于 ss_DECIPHER(75%),但高于其他程序。此外,fs_DECIPHER 在所有基准测试程序中的假阳性率最低 (<0.20%)。当嵌合体由亲缘关系较近的亲本 (差异小于 10%) 形成时,只有 ChimeraSlayer 和 Uchime 超过了 DECIPHER。考虑到程序之间的差异,仅使用 ss_DECIPHER 和 Uchime 就可以检测到超过 89%的所有嵌合体。使用 fs_DECIPHER,我们在 RDP、SILVA 和 Greengenes 数据库中检测到了 1%到 2%的额外嵌合体,这些数据库中已经使用 Pintail 或 Bellerophon 去除了嵌合体。DECIPHER 是用 R 编程语言实现的,可以通过网页直接访问,也可以下载程序作为 R 包 (http://DECIPHER.cee.wisc.edu)。