Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain.
BMC Bioinformatics. 2010 Jan 20;11:38. doi: 10.1186/1471-2105-11-38.
High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms.
SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming.
SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
高通量自动化测序使得测序数据呈指数级增长。为了避免数据库中出现人为序列污染,需要提高序列质量和可靠性。焦磷酸测序的出现加剧了这个问题,需要可定制的预处理算法。
SeqTrim 已实现为 Web 应用程序和独立的命令行应用程序。已经包含了已发表的和新设计的算法,用于识别序列插入、去除低质量、载体、接头、低复杂度和污染物序列,并检测嵌合读取。多种输入和输出格式的可用性使其可以包含在序列处理工作流程中。由于其特定的算法,SeqTrim 优于其他作为 Web 服务或独立应用程序实现的预处理程序。它在 EST 文库、SSH 文库、基因组 DNA 文库和焦磷酸测序reads 上的性能同样出色,并且不会过度修剪。
SeqTrim 是一个高效的管道,用于预处理任何类型的序列读取,包括下一代测序。它易于配置,并提供了一个友好的界面,允许用户了解每个预处理阶段序列发生了什么,并在需要时验证单个序列的预处理。推荐的管道比以前描述的预处理程序显示了更多关于每个序列的信息,并可以丢弃更多的测序或实验伪影。