Kaján László, Kertész-Farkas Attila, Franklin Dino, Ivanova Neli, Kocsor András, Pongor Sándor
Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology Padriciano 99, I-34012 Trieste, Italy.
Bioinformatics. 2006 Dec 1;22(23):2865-9. doi: 10.1093/bioinformatics/btl512. Epub 2006 Nov 7.
Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences.
We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.
似然比近似值(LRA)在统计学中已被广泛用于模型比较。本研究旨在探索其作为蛋白质序列分类中的评分(排名)函数的效用。
我们使用了一种基于两个排名最高的序列类别的最大相似度(或最小距离)得分的简单LRA。在旨在测试结构或进化方面远缘相关蛋白质之间序列相似性的数据集上,对评分方法(史密斯-沃特曼算法、BLAST、局部比对核和基于压缩的距离)进行了比较。结果发现,基于LRA的评分明显优于简单评分方法。