Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, 673601, India.
Computer Science and Engineering Dept., Rajiv Gandhi Institute of Technology, Kottayam, India.
J Mol Model. 2021 Aug 17;27(9):252. doi: 10.1007/s00894-021-04825-x.
Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the 1980s, various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by machine learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in the protein files. Our method proposed a multi-class classifier program named DLFSA for assigning protein secondary structure elements (SSE) using convolutional neural networks (CNNs). A fast and efficient GPU-based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of the protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. The model uses only C coordinates for secondary structure assignments. The model has been successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.
关于蛋白质结构分配的知识丰富了对蛋白质的结构和功能的理解。准确可靠的结构分配数据对于二级结构预测系统至关重要。自 20 世纪 80 年代以来,各种基于氢键分析和原子坐标几何的方法,以及随后的机器学习,已经被用于蛋白质结构分配。然而,当蛋白质文件中存在缺失的原子时,分配过程就变得具有挑战性。我们提出了一种名为 DLFSA 的多类分类器程序,用于使用卷积神经网络 (CNN) 分配蛋白质二级结构元素 (SSE)。一种快速高效的基于 GPU 的并行过程从蛋白质文件中提取片段。本工作中实现的模型使用蛋白质片段的子集进行训练,分别达到 88.1%和 82.5%的训练和测试精度。该模型仅使用 C 坐标进行二级结构分配。该模型还已成功应用于少数全长蛋白质进行测试。基于片段的研究结果表明,应用深度学习解决方案解决结构分配问题是可行的。