Shatnawi Maad, Zaki Nazar
College of Information Technology, UAEU, United Arab Emirates.
College of Information Technology, UAEU, United Arab Emirates.
Comput Biol Chem. 2015 Apr;55:23-30. doi: 10.1016/j.compbiolchem.2015.01.006. Epub 2015 Jan 24.
Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate and reliable prediction of protein domain linkers and boundaries is often considered to be the initial step of protein tertiary structure and function predictions. In this paper, we introduce CISA as a method for predicting inter-domain linker regions solely from the amino acid sequence information. The method first computes the amino acid compositional index from the protein sequence dataset of domain-linker segments and the amino acid composition. A preference profile is then generated by calculating the average compositional index values along the amino acid sequence using a sliding window. Finally, the protein sequence is segmented into intervals and a simulated annealing algorithm is employed to enhance the prediction by finding the optimal threshold value for each segment that separates domains from inter-domain linkers. The method was tested on two standard protein datasets and showed considerable improvement over the state-of-the-art domain linker prediction methods.
蛋白质链通常很长,由多个结构域组成。结构域是蛋白质中不同的结构单元,能够独立进化并发挥功能。蛋白质结构域连接区和边界的准确可靠预测通常被认为是蛋白质三级结构和功能预测的第一步。在本文中,我们介绍了CISA,这是一种仅从氨基酸序列信息预测结构域间连接区的方法。该方法首先从结构域-连接区片段的蛋白质序列数据集和氨基酸组成计算氨基酸组成指数。然后通过使用滑动窗口计算沿氨基酸序列的平均组成指数值来生成偏好概况。最后,将蛋白质序列分割成区间,并采用模拟退火算法通过为每个将结构域与结构域间连接区分开的片段找到最佳阈值来增强预测。该方法在两个标准蛋白质数据集上进行了测试,与当前最先进的结构域连接区预测方法相比有显著改进。