Frishman D, Albermann K, Hani J, Heumann K, Metanomski A, Zollner A, Mewes H W
GSF-Forschungszentrum für Umwelt und Gesundheit, Munich Information Center for Protein Sequences, Martinsried, Germany.
Bioinformatics. 2001 Jan;17(1):44-57. doi: 10.1093/bioinformatics/17.1.44.
Enormous demand for fast and accurate analysis of biological sequences is fuelled by the pace of genome analysis efforts. There is also an acute need in reliable up-to-date genomic databases integrating both functional and structural information. Here we describe the current status of the PEDANT software system for high-throughput analysis of large biological sequence sets and the genome analysis server associated with it.
The principal features of PEDANT are: (i) completely automatic processing of data using a wide range of bioinformatics methods, (ii) manual refinement of annotation, (iii) automatic and manual assignment of gene products to a number of functional and structural categories, (iv) extensive hyperlinked protein reports, and (v) advanced DNA and protein viewers. The system is easily extensible and allows to include custom methods, databases, and categories with minimal or no programming effort. PEDANT is actively used as a collaborative environment to support several on-going genome sequencing projects. The main purpose of the PEDANT genome database is to quickly disseminate well-organized information on completely sequenced and unfinished genomes. It currently includes 80 genomic sequences and in many cases serves as the only source of exhaustive information on a given genome. The database also acts as a vehicle for a number of research projects in bioinformatics. Using SQL queries, it is possible to correlate a large variety of pre-computed properties of gene products encoded in complete genomes with each other and compare them with data sets of special scientific interest. In particular, the availability of structural predictions for over 300 000 genomic proteins makes PEDANT the most extensive structural genomics resource available on the web.
基因组分析工作的快速推进引发了对生物序列快速准确分析的巨大需求。同时,对于整合功能和结构信息的可靠的最新基因组数据库也有着迫切需求。在此,我们描述了用于高通量分析大型生物序列集的PEDANT软件系统及其相关的基因组分析服务器的当前状态。
PEDANT的主要特点包括:(i)使用多种生物信息学方法对数据进行完全自动化处理;(ii)注释的人工完善;(iii)将基因产物自动和人工分配到多个功能和结构类别;(iv)广泛的超链接蛋白质报告;以及(v)先进的DNA和蛋白质查看器。该系统易于扩展,只需最少的编程工作或无需编程即可纳入自定义方法、数据库和类别。PEDANT被积极用作协作环境,以支持多个正在进行的基因组测序项目。PEDANT基因组数据库的主要目的是快速传播关于已完全测序和未完成基因组的组织良好的信息。它目前包含80个基因组序列,在许多情况下是给定基因组详尽信息的唯一来源。该数据库还充当了许多生物信息学研究项目的载体。使用SQL查询,可以将完整基因组中编码的基因产物的大量预先计算的属性相互关联,并与具有特殊科学兴趣的数据集进行比较。特别是,超过30万种基因组蛋白质的结构预测的可用性使PEDANT成为网络上可用的最广泛的结构基因组学资源。