Ogiwara A, Uchiyama I, Takagi T, Kanehisa M
Human Genome Center, University of Tokyo, Japan.
Protein Sci. 1996 Oct;5(10):1991-9. doi: 10.1002/pro.5560051005.
A new sequence motif library StrProf was constructed characterizing the groups of related proteins in the PDB three-dimensional structure database. For a representative member of each protein family, which was identified by cross-referencing the PDB with the PIR superfamily classification, a group of related sequences was collected by the BLAST search against the nonredundant protein sequence database. For every group, the motifs were identified automatically according to the criteria of conservation and uniqueness of pentapeptide patterns and with a dual dynamic programming algorithm. In the StrProf library, motifs are represented by profile matrices rather than consensus patterns to allow more flexible search capabilities. Another dynamic programming algorithm was then developed to search this motif library. When the computationally derived StrProf was compared with PROSITE, which is a manually derived motif library in the best consensus pattern representation, the numbers of identified patterns were comparable. StrProf missed about one third of the PROSITE motifs, but there were also new motifs lacking in PROSITE. The new library was incorporated in SMART (Sequence Motif Analysis and Retrieval Tool), a computer tool designed to help search and annotate biologically important sites in an unknown protein sequence. The client program is available free of charge through the Internet.
构建了一个新的序列基序库StrProf,用于表征蛋白质数据银行(PDB)三维结构数据库中相关蛋白质组。对于通过将PDB与蛋白质信息资源(PIR)超家族分类进行交叉引用而确定的每个蛋白质家族的代表性成员,通过对非冗余蛋白质序列数据库进行BLAST搜索来收集一组相关序列。对于每个组,根据五肽模式的保守性和独特性标准以及双动态规划算法自动识别基序。在StrProf库中,基序由轮廓矩阵表示,而不是一致模式,以允许更灵活的搜索功能。然后开发了另一种动态规划算法来搜索这个基序库。当将通过计算得出的StrProf与PROSITE(一种以最佳一致模式表示的手动推导的基序库)进行比较时,识别出的模式数量相当。StrProf遗漏了约三分之一的PROSITE基序,但也有一些PROSITE中没有的新基序。这个新库被纳入了SMART(序列基序分析和检索工具),这是一个旨在帮助搜索和注释未知蛋白质序列中生物学重要位点的计算机工具。客户端程序可通过互联网免费获取。