一个用于处理序列同源性数据的专家系统。

An expert system for processing sequence homology data.

作者信息

Sonnhammer E L, Durbin R

机构信息

Sanger Centre, Hinxton, Cambridge, UK.

出版信息

Proc Int Conf Intell Syst Mol Biol. 1994;2:363-8.

PMID:7584413

Abstract

When confronted with the task of finding homology to large numbers of sequences, database searching tools such as Blast and Fasta generate prohibitively large amounts of information. An automatic way of making most of the decisions a trained sequence analyst would make was developed by means of a rule-based expert system combined with an algorithm to avoid non-informative biased residue composition matches. The results found relevant by the system are presented in a very concise and clear way, so that the homology can be assessed with minimum effort. The expert system, HSPcrunch, was implemented to process the output to the programs in the BLAST suite. HSPcrunch embodies rules on detecting distant similarities when pairs of weak matches are consistent with a larger gapped alignment, i.e. when Blast has broken a longer gapped alignment up into smaller ungapped ones. This way, more distant similarities can be detected with no or little side-effects of more spurious matches. The rules for how small the gaps must be to be considered significant have been derived empirically. Currently a set of rules are used that operate on two different scoring levels, one for very weak matches that have very small gaps and one for medium weak matches that have slightly larger gaps. This set of rules proved to be robust for most cases and gives high fidelity separation between real homologies and spurious matches. One of the most important rules for reducing the amount of output is to limit the number of overlapping matches to the same region of the query sequence.(ABSTRACT TRUNCATED AT 250 WORDS)

摘要

当面临寻找与大量序列的同源性这一任务时，诸如Blast和Fasta等数据库搜索工具会生成数量多得令人望而却步的信息。借助基于规则的专家系统与一种算法相结合的方式，开发出了一种自动做出训练有素的序列分析师会做出的大多数决策的方法，该算法可避免无信息的偏向性残基组成匹配。系统发现的相关结果以非常简洁明了的方式呈现，这样同源性就能以最小的工作量进行评估。专家系统HSPcrunch被用于处理BLAST套件中程序的输出。HSPcrunch体现了一些规则，当弱匹配对与更大的缺口比对一致时，即当Blast将更长的缺口比对分解成更小的无缺口比对时，可检测到远距离相似性。通过这种方式，可以检测到更远的相似性，同时不会产生或很少产生更多虚假匹配的副作用。关于缺口必须小到何种程度才被视为显著的规则是通过经验得出的。目前使用了一组在两个不同评分水平上运行的规则，一个用于缺口非常小的非常弱匹配，另一个用于缺口稍大的中等弱匹配。这组规则在大多数情况下都很稳健，能在真实同源性和虚假匹配之间实现高保真度的区分。减少输出量的最重要规则之一是限制与查询序列同一区域的重叠匹配数量。（摘要截短为250字）

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一个用于处理序列同源性数据的专家系统。

An expert system for processing sequence homology data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

一个用于处理序列同源性数据的专家系统。

An expert system for processing sequence homology data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献