Suppr超能文献

一种优化的程序极大地提高了EST载体污染去除率。

An optimized procedure greatly improves EST vector contamination removal.

作者信息

Chen Yi-An, Lin Chang-Chun, Wang Chin-Di, Wu Huan-Bin, Hwang Pei-Ing

机构信息

Bioinformatics Core Laboratory, Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan.

出版信息

BMC Genomics. 2007 Nov 13;8:416. doi: 10.1186/1471-2164-8-416.

Abstract

BACKGROUND

The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean.

RESULTS

In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC.

CONCLUSION

Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences.

摘要

背景

公共领域数据库中大量的序列数据已成为探索生命科学各个主题的研究人员的宝库,因此这些数据的质量受到研究人员的严重关注。去除载体污染是从自动DNA测序仪输出的碱基序列中获得仅包含cDNA插入片段的准确序列数据的最重要操作之一。用于完成载体修剪的流行生物信息学程序包括LUCY、cross_match和SeqClean。

结果

然而,在最近一项研究中,使用SeqClean程序从我们通过各种文库构建系统编译的EST数据测试集中去除载体污染时,初步修剪后仍存在大量错误。后来,通过简单地使用克隆载体的重新线性化形式与目标EST进行比较,这些错误几乎完全得到了纠正。还将SeqClean的改进修剪程序与其他两个流行程序LUCY2和cross_match的修剪效率进行了比较。在所有测试条件下,使用带有克隆载体重新线性化形式的SeqClean显著超过了其他两个程序,而其他两个程序的性能不受修改程序的影响。本研究还调查了dbEST中的载体污染情况:从dbEST(2007年4月18日冻结)中抽样的48212个EST中,有2203个与UNIVEC中的序列匹配。

结论

如今,载体污染仍然是公共序列数据库数据质量的严重问题。基于此处给出的结果,我们认为应向所有研究人员推荐我们改进后的SeqClean程序,用于从EST或基因组序列中去除载体的任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/151c/2194723/5536ede2096b/1471-2164-8-416-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验