基于偏最小二乘法的蛋白质家族分类

Protein family classification with partial least squares.

作者信息

Opiyo Stephen O, Moriyama Etsuko N

机构信息

Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583-0915, USA.

出版信息

J Proteome Res. 2007 Feb;6(2):846-53. doi: 10.1021/pr060534k.

DOI:10.1021/pr060534k

PMID:17269741

Abstract

The quality of protein function predictions relies on appropriate training of protein classification methods. Performance of these methods can be affected when only a limited number of protein samples are available, which is often the case in divergent protein families. Whereas profile hidden Markov models and PSI-BLAST presented significant performance decrease in such cases, alignment-free partial least-squares classifiers performed consistently better even when used to identify short fragmented sequences.

摘要

蛋白质功能预测的质量依赖于蛋白质分类方法的适当训练。当只有有限数量的蛋白质样本可用时，这些方法的性能可能会受到影响，在不同的蛋白质家族中情况通常如此。在这种情况下，轮廓隐马尔可夫模型和PSI-BLAST的性能显著下降，而即使用于识别短片段序列，无比对偏最小二乘分类器的表现始终更好。