IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):934-943. doi: 10.1109/TCBB.2017.2705080. Epub 2017 May 17.
Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
柔性蛋白质是指其结构发生构象变化的蛋白质。蛋白质柔性分析对于蛋白质功能的分类和理解至关重要。为此分析,必须检测到蛋白质显示柔性的铰链区域。为了检测铰链的位置,先前的方法利用了蛋白质的三维(3D)结构,这是高度计算密集型的。为了降低计算复杂度,本研究提出了一种使用结构字母(SAs)的新颖基于文本的方法来检测铰链位置,称为 NAHAL-Flex。将蛋白质结构编码为一种特殊类型的 SA,称为蛋白质折叠形状码(PFSC),它不受位置、比例和旋转的影响。蛋白质的柔性区域是字母序列唯一可以变形的地方。有了这个知识,就可以使用动态规划(DP)算法找到两个字母序列的最长对齐路径。然后,该方法寻找字母序列变形的区域以找到最可能的铰链位置。为了减少铰链位置的数量,利用遗传算法(GA)找到最佳候选铰链点。为了评估该方法的有效性,使用了四个不同的柔性和刚性蛋白质数据库,包括两个小数据集和两个大数据集。对于小数据集,NAHAL-Flex 方法与最先进的结构柔性对齐方法相当。对于大数据集的结果表明,NAHAL-Flex 优于一些知名的对齐方法,例如 DaliLite、Matt、DeepAlign 和 TM-align;NAHAL-Flex 的速度更快,结果比其他方法更准确。