Jones David T, Singh Tanya, Kosciolek Tomasz, Tetchner Stuart
Bioinformatics Group, Department of Computer Science, University College London, London WC1E 6BT, UK.
Bioinformatics. 2015 Apr 1;31(7):999-1006. doi: 10.1093/bioinformatics/btu791. Epub 2014 Nov 26.
Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues.
Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV.
MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV.
Supplementary data are available at Bioinformatics online.
用于推断残基对之间直接进化耦合的统计技术的最新进展,使得基于共变的接触预测成为蛋白质精确三维建模的一种可行方法,该方法仅需序列信息。为了扩展接触预测的用途,我们设计了一种新的元预测器(MetaPSICOV),它结合了三种从多序列比对中推断共变信号的不同方法,考虑了广泛的其他序列衍生特征,并且独特地考虑了一系列描述输入多序列比对局部和全局质量的指标。最后,我们使用两阶段预测器,其中第二阶段对第一阶段的输出进行过滤。此外,还对该两阶段预测器准确预测氢键长程网络的能力进行了评估,包括正确指定供体和受体残基。
使用150个蛋白质家族的原始PSICOV基准集,MetaPSICOV对于前L个预测的长程接触实现了0.54的平均精度,比PSICOV高约60%,比CCMpred好约40%。在使用FRAGFOLD进行的从头蛋白质结构预测中,与PSICOV相比,MetaPSICOV能够将模型的TM分数中位数提高0.05。最后,对于预测长程氢键,MetaPSICOV-HB对于前L/10个氢键的精度达到0.69,而基线MetaPSICOV仅为0.26。
MetaPSICOV可作为免费的网络服务器在http://bioinf.cs.ucl.ac.uk/MetaPSICOV上获取。原始数据(预测的接触列表和三维模型)以及源代码可从http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV下载。
补充数据可在《生物信息学》在线获取。