Fotoohifiroozabadi Samira, Mohamad Mohd Saberi, Deris Safaai
1 Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai 81310 Johor, Malaysia.
2 Faculty of Creative Technology & Heritage, Universiti Malaysia Kelantan, Locked Bag 01, 16300 Bachok, Kota Bharu, Kelantan, Malaysia.
J Bioinform Comput Biol. 2017 Apr;15(2):1750004. doi: 10.1142/S0219720017500044. Epub 2017 Jan 26.
Protein structure alignment and comparisons that are based on an alphabetical demonstration of protein structure are more simple to run with faster evaluation processes; thus, their accuracy is not as reliable as three-dimension (3D)-based tools. As a 1D method candidate, TS-AMIR used the alphabetic demonstration of secondary-structure elements (SSE) of proteins and compared the assigned letters to each SSE using the [Formula: see text]-gram method. Although the results were comparable to those obtained via geometrical methods, the SSE length and accuracy of adjacency between SSEs were not considered in the comparison process. Therefore, to obtain further information on accuracy of adjacency between SSE vectors, the new approach of assigning text to vectors was adopted according to the spherical coordinate system in the present study. Moreover, dynamic programming was applied in order to account for the length of SSE vectors. Five common datasets were selected for method evaluation. The first three datasets were small, but difficult to align, and the remaining two datasets were used to compare the capability of the proposed method with that of other methods on a large protein dataset. The results showed that the proposed method, as a text-based alignment approach, obtained results comparable to both 1D and 3D methods. It outperformed 1D methods in terms of accuracy and 3D methods in terms of runtime.
基于蛋白质结构字母表示法的蛋白质结构比对和比较运行起来更简单,评估过程更快;因此,它们的准确性不如基于三维(3D)的工具可靠。作为一维方法的候选者,TS-AMIR使用蛋白质二级结构元件(SSE)的字母表示法,并使用[公式:见正文]-gram方法将分配的字母与每个SSE进行比较。尽管结果与通过几何方法获得的结果相当,但在比较过程中未考虑SSE长度和SSE之间邻接的准确性。因此,为了获得关于SSE向量之间邻接准确性的更多信息,本研究根据球坐标系采用了将文本分配给向量的新方法。此外,应用动态规划以考虑SSE向量的长度。选择了五个常见数据集进行方法评估。前三个数据集较小,但难以比对,其余两个数据集用于在大型蛋白质数据集上比较所提出方法与其他方法的能力。结果表明,所提出的方法作为基于文本的比对方法,获得了与一维和三维方法相当的结果。它在准确性方面优于一维方法,在运行时间方面优于三维方法。