School of Information and Communication Technology, Griffith University, Brisbane, Australia.
Institute of Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
BMC Bioinformatics. 2022 Jan 4;23(1):6. doi: 10.1186/s12859-021-04525-6.
Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. In this paper, we propose to train separate deep learning models for each category of secondary structures. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way.
The new method named SAP4SS obtains mean absolute error (MAE) values of 15.59, 18.87, 6.03, and 21.71 respectively for four types of backbone angles [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text]. Consequently, SAP4SS significantly outperforms existing state-of-the-art methods SAP, OPUS-TASS, and SPOT-1D: the differences in MAE for all four types of angles are from 1.5 to 4.1% compared to the best known results.
SAP4SS along with its data is available from https://gitlab.com/mahnewton/sap4ss .
随着深度学习方法的发展,蛋白质骨架角度预测已经取得了显著的准确性提高。通常,无论它们所属的二级结构类别如何,同一个深度学习模型都用于对所有残基进行预测。在本文中,我们建议为每个二级结构类别训练单独的深度学习模型。机器学习方法努力在训练示例上实现通用性,因此会降低准确性。在这项工作中,我们明确利用分类知识来限制特定类别的训练示例中的泛化。这是通过以明智的方式利用专门知识来补偿通用性的损失。
新方法 SAP4SS 分别获得了四个类型的骨架角度 [Formula: see text]、[Formula: see text]、[Formula: see text] 和 [Formula: see text] 的平均绝对误差(MAE)值为 15.59、18.87、6.03 和 21.71。因此,SAP4SS 显著优于现有的最先进方法 SAP、OPUS-TASS 和 SPOT-1D:与已知的最佳结果相比,所有四种类型的角度的 MAE 差异在 1.5%到 4.1%之间。
SAP4SS 及其数据可从 https://gitlab.com/mahnewton/sap4ss 获得。