Suppr超能文献

支持向量机-贝叶斯序列比对相似度算法:基于贝叶斯序列比对的远程同源性检测

SVM-BALSA: remote homology detection based on Bayesian sequence alignment.

作者信息

Webb-Robertson Bobbie-Jo, Oehmen Christopher, Matzke Melissa

机构信息

Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA.

出版信息

Comput Biol Chem. 2005 Dec;29(6):440-3. doi: 10.1016/j.compbiolchem.2005.09.006. Epub 2005 Nov 10.

Abstract

Biopolymer sequence comparison to identify evolutionarily related proteins, or homologs, is one of the most common tasks in bioinformatics. Support vector machines (SVMs) represent a new approach to the problem in which statistical learning theory is employed to classify proteins into families, thus identifying homologous relationships. Current SVM approaches have been shown to outperform iterative profile methods, such as PSI-BLAST, for protein homology classification. In this study, we demonstrate that the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation over a benchmark dataset consisting of 54 unique protein families. The SVM-BALSA algorithms returns a higher area under the receiver operating characteristic (ROC) curves for 37 of the 54 families and achieves an improved overall performance curve at a significance level of 0.07.

摘要

生物聚合物序列比较以识别进化相关的蛋白质或同源物,是生物信息学中最常见的任务之一。支持向量机(SVM)代表了一种解决该问题的新方法,其中运用统计学习理论将蛋白质分类到家族中,从而识别同源关系。目前的支持向量机方法已被证明在蛋白质同源性分类方面优于迭代轮廓方法,如PSI-BLAST。在本研究中,我们证明,在支持向量机构建中使用考虑所有可能比对不确定性的贝叶斯比对分数,与传统动态规划实现相比,在由54个独特蛋白质家族组成的基准数据集上提高了灵敏度。支持向量机-贝叶斯局部比对搜索工具算法在54个家族中的37个家族中,在接收器操作特征(ROC)曲线下返回更高的面积,并在0.07的显著性水平上实现了改进的整体性能曲线。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验