Pagni M, Iseli C, Junier T, Falquet L, Jongeneel V, Bucher P
Swiss Institute of Bioinformatics, Ludwig Institute for Cancer Research, Chemin des Boveresses 155, CH-1066, Epalinges s/Lausanne, Switzerland.
Nucleic Acids Res. 2001 Jan 1;29(1):148-51. doi: 10.1093/nar/29.1.148.
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).
高通量基因组(HTG)和表达序列标签(EST)序列是目前公共数据库中最丰富的核苷酸序列类别。其数量巨大、高度碎片化且缺乏基因结构注释,使得通过标准搜索方法对HTG和EST数据进行蛋白质序列同源性的高效搜索变得困难。在此,我们简要描述三种新开发的资源,这些资源应能使未来在这些序列类别中发现有趣的基因变得更加容易,特别是对于那些无法使用强大的本地生物信息学环境的生物学家而言。trEST和trGEN分别是从EST和HTG序列预测的假设蛋白质序列的定期更新数据库。Hits是一个基于网络的数据检索和分析系统,可提供对蛋白质序列(包括来自trEST和trGEN的序列)与来自Prosite和Pfam的模式及谱之间预先计算的匹配结果的访问。这三种资源可通过Hits主页(http://hits.isb-sib.ch)进行访问。