Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
Biomolecules. 2022 Jun 17;12(6):841. doi: 10.3390/biom12060841.
The assignment of secondary structure elements in protein conformations is necessary to interpret a protein model that has been established by computational methods. The process essentially involves labeling the amino acid residues with H (Helix), E (Strand), or C (Coil, also known as Loop). When particular atoms are absent from an input protein structure, the procedure becomes more complicated, especially when only the alpha carbon locations are known. Various techniques have been tested and applied to this problem during the last forty years. The application of machine learning techniques is the most recent trend. This contribution presents the HECA classifier, which uses neural networks to assign protein secondary structure types. The technique exclusively employs Cα coordinates. The Keras (TensorFlow) library was used to implement and train the neural network model. The BioShell toolkit was used to calculate the neural network input features from raw coordinates. The study's findings show that neural network-based methods may be successfully used to take on structure assignment challenges when only Cα trace is available. Thanks to the careful selection of input features, our approach's accuracy (above 97%) exceeded that of the existing methods.
在通过计算方法建立蛋白质模型后,需要对其二级结构元素进行分配,以解释该模型。该过程本质上涉及用 H(螺旋)、E(链)或 C(无规卷曲,也称为环)对氨基酸残基进行标记。当输入蛋白质结构中缺少特定原子时,该过程会变得更加复杂,特别是当仅知道α碳原子的位置时。在过去的四十年中,已经测试和应用了各种技术来解决这个问题。机器学习技术的应用是最新的趋势。本贡献介绍了 HECA 分类器,它使用神经网络来分配蛋白质二级结构类型。该技术专门使用 Cα坐标。使用 Keras(TensorFlow)库来实现和训练神经网络模型。BioShell 工具包用于从原始坐标计算神经网络输入特征。研究结果表明,当仅提供 Cα轨迹时,基于神经网络的方法可以成功用于解决结构分配挑战。由于精心选择了输入特征,我们的方法的准确性(超过 97%)超过了现有方法。