Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada.
FEMS Microbiol Lett. 2011 Jun;319(2):140-5. doi: 10.1111/j.1574-6968.2011.02274.x. Epub 2011 Apr 27.
Reverse complementary DNA sequences - sequences that are inadvertently given backwards with all purines and pyrimidines transposed - can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool -v-revcomp (http://www.cmde.science.ubc.ca/mohn/software.html) - to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406 781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1 171 646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies.
反向互补 DNA 序列——与所有嘌呤和嘧啶都颠倒的序列——可能会对序列分析产生不利影响,除非将其考虑在内。我们提供了一个开源的高通量软件工具 -v-revcomp(http://www.cmde.science.ubc.ca/mohn/software.html)- 用于检测和重新定向小亚基 rRNA(16S)基因的反向互补条目,特别是从环境来源的测序数据集中。该软件支持从全长到下一代测序技术特有的短读序列的各种长度。我们通过筛选在经过精心整理的 SILVA 数据库第 102 版中存储的所有 406781 条 16S 序列来评估 v-revcomp 的可靠性,并证明该工具的检测准确性几乎达到 100%。随后,我们使用 v-revcomp 分析了在国际核苷酸序列数据库中存储的 1171646 条 16S 序列,发现其中约 1%的用户提交序列是反向互补的。此外,相当一部分条目存在其他异常情况,包括反向互补嵌合体、与错误分类群相关的序列、非核糖体基因、质量差的序列或与数据库中任何其他条目都没有合理匹配的其他错误序列。因此,v-revcomp 非常高效地检测和重新定向几乎任何长度的反向互补 16S 序列,并可用于检测各种序列异常。