Computer Science Department, George Mason University, Fairfax, VA, USA.
BMC Bioinformatics. 2009 Dec 22;10:439. doi: 10.1186/1471-2105-10-439.
Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models.
We present a general purpose protein residue annotation toolkit (svmPRAT) to allow biologists to formulate residue-wise prediction problems. svmPRAT formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of svmPRAT is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue svmPRAT captures local information around the reside to create fixed length feature vectors. svmPRAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training effective predictive models.
In this work we evaluate svmPRAT on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. svmPRAT has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems.
在过去的十年中,已经开发出了几种预测方法,用于使用序列和序列衍生信息来确定单个蛋白质残基的结构和功能特性。这些方法大多数基于支持向量机,因为它们提供了准确和可推广的预测模型。
我们提出了一种通用的蛋白质残基注释工具包(svmPRAT),允许生物学家制定残基预测问题。svmPRAT 使用支持向量机将注释问题表述为分类或回归问题。svmPRAT 的一个关键特点是它易于使用,以支持向量机的形式纳入任何用户提供的信息,如特征矩阵。对于每个残基,svmPRAT 捕获残基周围的局部信息,以创建固定长度的特征向量。svmPRAT 实现了准确和快速的核函数,并且还引入了灵活的基于窗口的编码方案,该方案可以准确地捕获信号和模式,以训练有效的预测模型。
在这项工作中,我们在几个分类和回归问题上评估了 svmPRAT,包括无序预测、残基接触顺序估计、DNA 结合位点预测和局部结构字母预测。svmPRAT 还用于开发了一种称为 TOPTMH 的最先进的跨膜螺旋预测方法和一种称为 YASSPP 的二级结构预测方法。这个开发的工具包为从业者提供了一个高效且易于使用的工具,用于解决各种注释问题。