Lee Byungwook, Shin Gwangsik
Korean BioInformation Center, KRIBB, Daejeon 305-817, Korea.
Nucleic Acids Res. 2009 Jan;37(Database issue):D686-9. doi: 10.1093/nar/gkn648. Epub 2008 Oct 2.
The EST division of GenBank, dbEST, is widely used in many applications such as gene discovery and verification of exon-intron structure. However, the use of EST sequences in the dbEST libraries is often hampered by inconsistent terminology used to describe the library sources and by the presence of contaminated sequences. Here, we describe CleanEST, a novel database server that classified dbEST libraries and removes contaminants. We classified all dbEST libraries according to species and sequencing center. In addition, we further classified human EST libraries by anatomical and pathological systems according to eVOC ontologies. For each dbEST library, we provide two different cleansed sequences: 'pre-cleansed' and 'user-cleansed'. To generate pre-cleansed sequences, we cleansed sequences in dbEST by alignment of EST sequences against well-known contamination sources: UniVec, Escherichia coli, mitochondria and chloroplast (for plant). To provide user-cleansed sequences, we built an automatic user-cleansing pipeline, in which sequences of a user-selected library are cleansed on-the-fly according to user-selected options. The server is available at http://cleanest.kobic.re.kr/ and the database is updated monthly.
GenBank的EST部门,即dbEST,在许多应用中被广泛使用,如基因发现和外显子-内含子结构的验证。然而,dbEST文库中EST序列的使用常常受到用于描述文库来源的不一致术语以及污染序列存在的阻碍。在此,我们描述了CleanEST,一种新型的数据库服务器,它对dbEST文库进行分类并去除污染物。我们根据物种和测序中心对所有dbEST文库进行了分类。此外,我们根据eVOC本体论,通过解剖学和病理系统对人类EST文库进行了进一步分类。对于每个dbEST文库,我们提供两种不同的净化序列:“预净化”和“用户净化”。为了生成预净化序列,我们通过将EST序列与已知的污染源(UniVec、大肠杆菌、线粒体和叶绿体(对于植物))进行比对,对dbEST中的序列进行了净化。为了提供用户净化序列,我们构建了一个自动用户净化管道,其中用户选择的文库的序列会根据用户选择的选项实时进行净化。该服务器可在http://cleanest.kobic.re.kr/上获取,数据库每月更新一次。