Department of Computer Science, University of Minnesota, 117 Pleasant St SE, Room 464, Minneapolis, MN 55455, USA.
Bioinformatics. 2009 Dec 1;25(23):3099-107. doi: 10.1093/bioinformatics/btp561. Epub 2009 Sep 28.
Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods.
Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.
Software developed for this study is available at http://bioinfo.cs.umn.edu/supplements/binf2009 along with Supplementary data on the study.
识别与配体相互作用的残基对于理解蛋白质功能是有用的,并且有助于设计针对蛋白质相互作用的小分子。有几项研究表明,序列特征对于这种类型的预测非常有用,而当结构可用时,结构特征也很有用。我们开发了一种基于序列的方法,称为 LIBRUS,它结合了基于同源性的转移和使用机器学习的直接预测,并将其与以前的基于序列的工作和当前的基于结构的方法进行了比较。
我们的分析表明,基于同源性的转移比使用轮廓和预测二级结构的支持向量机学习者稍微具有更强的辨别能力。我们将这两种方法结合在一种称为 LIBRUS 的方法中。在 885 个序列独立蛋白的基准测试中,它在 45%的召回率下达到了 50%的精度,ROC 曲线下的面积(ROC)为 0.83,这与以前的基于序列的努力相比有了显著的提高。在一个独立的基准测试集中,基于结构特征的当前方法 FINDSITE 达到了 0.81 的 ROC,在 50%的召回率下达到了 54%的精度,而 LIBRUS 则以较小的计算成本在 50%的召回率下达到了 39%的精度,ROC 为 0.82。当 LIBRUS 和 FINDSITE 的预测结合使用时,性能提高到超过任何一种方法,ROC 为 0.86,在 50%的召回率下达到了 59%的精度。
为这项研究开发的软件可在 http://bioinfo.cs.umn.edu/supplements/binf2009 上获得,并且还提供了关于该研究的补充数据。