Jing Xiaoyang, Dong Qiwen, Lu Ruqian
School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China.
School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
BMC Bioinformatics. 2017 Sep 2;18(1):390. doi: 10.1186/s12859-017-1811-9.
In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair.
First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics.
The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.
在结构生物学领域,蛋白质残基-残基接触在蛋白质结构预测中起着至关重要的作用。一些研究人员发现,预测的残基-残基接触可以有效地限制构象搜索空间,这对于从头蛋白质结构预测具有重要意义。在过去几十年中,相关研究人员开发了各种方法来预测残基-残基接触,特别是近年来使用融合方法取得了显著的性能提升。在这项工作中,提出了一种基于排序策略的新型融合方法来预测接触。与传统的回归或分类策略不同,接触预测任务被视为一个排序任务。首先,从相关突变方法和集成机器学习分类器中提取两种特征,然后所提出的方法使用排序学习算法来预测每个残基对的接触概率。
首先,我们分别在CASP11数据集和CASP12数据集上对所提出的融合方法(RRCRank)进行了两次基准测试。测试结果表明,RRCRank方法优于其他成熟的方法,特别是对于中短程接触。其次,为了验证排序策略的优越性,我们基于与排序策略相同的特征,使用传统的回归和分类策略来预测接触。与这两种传统策略相比,所提出的排序策略在三种接触类型上表现出更好的性能,特别是对于长程接触。第三,将所提出的RRCRank与CASP11和CASP12中的几种最先进的方法进行了比较。结果表明,RRCRank可以实现相当的预测精度,并且在大多数评估指标上优于三种方法。
引入排序学习算法来开发一种用于蛋白质残基-残基接触预测的基于排序的新方法,该方法在广泛的评估中达到了最先进的性能。