Rangwala Huzefa, Karypis George
Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
Comput Syst Bioinformatics Conf. 2007;6:311-22.
The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this paper focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared to the profile-to-profile scoring schemes.
通过在初始序列-结构比对中纳入预测的结构信息,蛋白质结构预测的比较建模方法的有效性可得到显著提高。受用于比对蛋白质结构的方法的启发,本文着重于开发机器学习方法来估计一对蛋白质片段的均方根偏差(RMSD)值。这些估计的片段级RMSD值可用于构建比对、评估比对质量以及识别高质量的比对片段。我们提出了基于支持向量回归和分类的监督学习框架来解决此片段级RMSD预测问题的算法,该框架纳入了蛋白质轮廓、预测的二级结构、有效的信息编码方案以及新颖的二阶成对指数核函数。我们全面的实证研究表明,与轮廓到轮廓的评分方案相比,结果更优。