Carpentier Mathilde, Brouillet Sophie, Pothier Joël
Atelier de BioInformatique, Paris, France.
Proteins. 2005 Oct 1;61(1):137-51. doi: 10.1002/prot.20517.
YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (alpha angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query-database structure pair; and (5) ranking the query-database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches.
YAKUSA是一个用于使用查询蛋白质结构快速扫描结构数据库的程序。它在查询结构与结构数据库中的每个结构之间搜索称为SHSPs(结构高分对)的最长公共子结构。它利用蛋白质主链内部坐标(α角)将蛋白质结构描述为符号序列。结构相似性通过5个步骤确定,前3个步骤类似于BLAST中使用的步骤:(1)构建一个确定性有限自动机,描述所有与查询结构中相同或相似的模式;(2)在数据库中的每个结构中搜索所有这些模式;(3)将模式扩展为更长的匹配子结构(即SHSPs);(4)为每个查询-数据库结构对选择兼容的SHSPs;(5)使用基于SHSP相似性、SHSP概率和SHSP空间兼容性的3个分数对查询-数据库结构对进行排名。结构片段概率根据混合转移分布模型进行估计,该模型是高阶马尔可夫链模型的近似。关于结构匹配的灵敏度和选择性,YAKUSA与最佳相关程序相比表现良好,尽管它要快得多:在台式个人计算机上进行典型的数据库扫描大约需要40秒的CPU时间。它也已在Web服务器上实现以进行实时搜索。