Liang Chun, Wang Gang, Liu Lin, Ji Guoli, Liu Yuansheng, Chen Jinqiao, Webb Jason S, Reese Greg, Dean Jeffrey F D
Department of Botany, Miami University, Oxford, Ohio 45056, USA.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W137-42. doi: 10.1093/nar/gkm299. Epub 2007 May 8.
Expressed sequence tags (ESTs) remain a dominant approach for characterizing the protein-encoding portions of various genomes. Due to inherent deficiencies, they also present serious challenges for data quality control. Before GenBank submission, EST sequences are typically screened and trimmed of vector and adapter/linker sequences, as well as polyA/T tails. Removal of these sequences presents an obstacle for data validation of error-prone ESTs and impedes data mining of certain functional motifs, whose detection relies on accurate annotation of positional information for polyA tails added posttranscriptionally. As raw DNA sequence information is made increasingly available from public repositories, such as NCBI Trace Archive, new tools will be necessary to reanalyze and mine this data for new information. WebTraceMiner (www.conifergdb.org/software/wtm) was designed as a public sequence processing service for raw EST traces, with a focus on detection and mining of sequence features that help characterize 3' and 5' termini of cDNA inserts, including vector fragments, adapter/linker sequences, insert-flanking restriction endonuclease recognition sites and polyA or polyT tails. WebTraceMiner complements other public EST resources and should prove to be a unique tool to facilitate data validation and mining of error-prone ESTs (e.g. discovery of new functional motifs).
表达序列标签(ESTs)仍然是表征各种基因组中蛋白质编码部分的主要方法。由于其固有的缺陷,它们也给数据质量控制带来了严峻挑战。在提交到GenBank之前,EST序列通常会被筛选并去除载体、衔接子/接头序列以及多聚A/T尾。去除这些序列给易出错的ESTs的数据验证带来了障碍,并阻碍了对某些功能基序的数据挖掘,这些功能基序的检测依赖于对转录后添加的多聚A尾的位置信息的准确注释。随着来自公共数据库(如NCBI Trace Archive)的原始DNA序列信息越来越多,将需要新的工具来重新分析和挖掘这些数据以获取新信息。WebTraceMiner(www.conifergdb.org/software/wtm)被设计为一种针对原始EST序列的公共序列处理服务,重点是检测和挖掘有助于表征cDNA插入片段3'和5'末端的序列特征,包括载体片段、衔接子/接头序列、插入片段侧翼的限制性内切酶识别位点以及多聚A或多聚T尾。WebTraceMiner补充了其他公共EST资源,应该会被证明是一种促进易出错ESTs的数据验证和挖掘(例如发现新的功能基序)的独特工具。