Ben-Hur Asa, Brutlag Douglas
Department of Biochemistry, B400 Beckman Center, Stanford University, CA 94305-5307, USA.
Bioinformatics. 2003;19 Suppl 1:i26-33. doi: 10.1093/bioinformatics/btg1002.
Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases.
We present a method for detecting remote homology that is based on the presence of discrete sequence motifs. The motif content of a pair of sequences is used to define a similarity that is used as a kernel for a Support Vector Machine (SVM) classifier. We test the method on two remote homology detection tasks: prediction of a previously unseen SCOP family and prediction of an enzyme class given other enzymes that have a similar function on other substrates. We find that it performs significantly better than an SVM method that uses BLAST or Smith-Waterman similarity scores as features.
远程同源性检测是在序列相似性较低的情况下检测同源性的问题。这是一个困难的计算问题,没有一种方法能在所有情况下都有效。
我们提出了一种基于离散序列基序存在的远程同源性检测方法。一对序列的基序内容用于定义一种相似性,该相似性用作支持向量机(SVM)分类器的核。我们在两个远程同源性检测任务上测试了该方法:预测一个以前未见过的SCOP家族,以及根据在其他底物上具有相似功能的其他酶来预测酶的类别。我们发现它的性能明显优于使用BLAST或Smith-Waterman相似性得分作为特征的SVM方法。