The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47401, USA.
BMC Bioinformatics. 2012 Sep 26;13:247. doi: 10.1186/1471-2105-13-247.
With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products.
We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications.
ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.
随着下一代测序(NGS)技术的出现,全长 cDNA 鸟枪法测序已成为转录组研究的主要方法,并且已经发明了几种不同的 454 测序方案。由于每个方案都使用自己的短 DNA 标签或接头连接到 cDNA 片段的末端进行标记或测序,因此不同的污染物可能导致错误组装和不准确的序列产物。
我们设计并实现了一个新的图形用户界面和批处理脚本程序,用于原始序列清理。清理过程包括几个模块,包括条形码修剪、测序接头修剪、扩增引物修剪、多 A 尾修剪、载体筛选和低质量区域修剪。这些模块可以根据各种测序应用进行组合。
ESTclean 是一个软件包,不仅可以用于清洁 cDNA 序列,还可以通过在图形用户界面中提供测序质量控制的汇总表和图形来帮助开发测序方案。它在清理使用条形码和多个扩增引物的复杂测序方案的读取序列方面表现出色。