Al-Shatnawi Mufleh, Ahmad M Omair, Swamy M N S
Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada.
Bioinformatics. 2015 Jan 1;31(1):40-7. doi: 10.1093/bioinformatics/btu556. Epub 2014 Aug 31.
Insertion/deletion (indel) and amino acid substitution are two common events that lead to the evolution of and variations in protein sequences. Further, many of the human diseases and functional divergence between homologous proteins are more related to indel mutations, even though they occur less often than the substitution mutations do. A reliable identification of indels and their flanking regions is a major challenge in research related to protein evolution, structures and functions.
In this article, we propose a novel scheme to predict indel flanking regions in a protein sequence for a given protein fold, based on a variable-order Markov model. The proposed indel flanking region (IndelFR) predictors are designed based on prediction by partial match (PPM) and probabilistic suffix tree (PST), which are referred to as the PPM IndelFR and PST IndelFR predictors, respectively. The overall performance evaluation results show that the proposed predictors are able to predict IndelFRs in the protein sequences with a high accuracy and F1 measure. In addition, the results show that if one is interested only in predicting IndelFRs in protein sequences, it would be preferable to use the proposed predictors instead of HMMER 3.0 in view of the substantially superior performance of the former.
插入/缺失(indel)和氨基酸替换是导致蛋白质序列进化和变异的两个常见事件。此外,许多人类疾病以及同源蛋白质之间的功能差异与indel突变更为相关,尽管它们的发生频率低于替换突变。可靠地识别indel及其侧翼区域是蛋白质进化、结构和功能相关研究中的一项重大挑战。
在本文中,我们提出了一种基于可变阶马尔可夫模型的新方案,用于预测给定蛋白质折叠的蛋白质序列中的indel侧翼区域。所提出的indel侧翼区域(IndelFR)预测器是基于部分匹配预测(PPM)和概率后缀树(PST)设计的,分别称为PPM IndelFR预测器和PST IndelFR预测器。整体性能评估结果表明,所提出的预测器能够高精度地预测蛋白质序列中的IndelFR,F1值也很高。此外,结果表明,如果仅对预测蛋白质序列中的IndelFR感兴趣,鉴于前者的性能明显更优,使用所提出的预测器比使用HMMER 3.0更可取。