School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China.
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China.
Bioinformatics. 2019 Sep 1;35(17):2982-2990. doi: 10.1093/bioinformatics/btz040.
Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem.
In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ε-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition.
Supplementary data are available at Bioinformatics online.
蛋白质折叠识别受到越来越多的关注,因为它对研究蛋白质的 3D 结构和药物设计至关重要。研究人员一直在广泛研究这个重要的任务,提出了几个具有高判别能力的特征。然而,开发有效结合这些特征以提高预测性能的方法仍然是一个具有挑战性的问题。
在这项研究中,我们提出了两种算法:MV-fold 和 MT-fold。MV-fold 是一种新的基于多视图学习模型的折叠识别计算预测器。蛋白质的不同特征被视为蛋白质的不同视图,包括进化信息、二级结构信息和物理化学性质。这些不同的视图构成了潜在空间。采用ε-dragging 技术扩大不同蛋白质折叠之间的边界,提高 MV-fold 的预测性能。然后,MV-fold 与两种基于模板的方法 HHblits 和 HMMER 相结合。集成方法称为 MT-fold,结合了判别方法和基于模板方法的优点。在五个广泛使用的基准数据集(DD、RDD、EDD、TG 和 LE)上的实验结果表明,所提出的方法在该领域的一些最先进的方法中表现出色,表明 MV-fold 和 MT-fold 是蛋白质折叠识别和蛋白质同源性检测的有用计算工具,并且将成为蛋白质序列分析的有效工具。最后,我们基于 SCOPe(版本 2.07)构建了一个更新和严格的基准数据集,以公平评估所提出方法的性能,我们的方法在这个新数据集上表现出稳定的性能。这个新的基准数据集将成为一个广泛使用的基准数据集,以公平评估不同折叠识别方法的性能。
补充数据可在生物信息学在线获取。