Bioinformatics Research Center & School of Computer Engineering, Nanyang Technological University, Singapore 639798.
Proteins. 2010 Feb 15;78(3):589-602. doi: 10.1002/prot.22583.
We introduce low-ASA residue pairs as classification features for distinguishing the different types of protein interactions. A low-ASA residue pair is defined as two contact residues each from one chain that have a small solvent accessible surface area (ASA). This notion of residue pairs is novel as it first combines residue pairs with the O-ring theory, an influential proposition stating that the binding hot spots at the interface are often surrounded by a ring of energetically less important residues. As binding hot spots lie in the core of the stability for protein interactions, we believe that low-ASA residue pairs can sharpen the distinction of protein interactions. The main part of our feature vector is 210-dimensional, consisting of all possible low-ASA residue pairs; the value of every feature is determined by a propensity measure. Our classification method is called OringPV, which uses propensity vectors of protein interactions for support vector machine. OringPV is tested on three benchmark datasets for a variety of classification tasks such as the distinction between crystal packing and biological interactions, the distinction between two different types of biological interactions, etc. The evaluation frameworks include within-dataset, cross-dataset comparison, and leave-one-out cross-validation. The results show that low-ASA residue pairs and the propensity vector description of protein interactions are truly strong in the distinction. In particular, many cross-dataset generalization capability tests have achieved excellent recalls and overall accuracies, much outperforming existing benchmark methods.
我们引入低 ASA 残基对作为区分不同类型蛋白质相互作用的分类特征。低 ASA 残基对定义为来自一条链的两个接触残基,它们具有较小的溶剂可及表面积(ASA)。这种残基对的概念是新颖的,因为它首次将残基对与 O 环理论结合在一起,O 环理论是一个有影响力的命题,指出界面上的结合热点通常被能量上不太重要的残基环包围。由于结合热点位于蛋白质相互作用的稳定性核心,我们相信低 ASA 残基对可以更准确地区分蛋白质相互作用。我们特征向量的主要部分是 210 维的,由所有可能的低 ASA 残基对组成;每个特征的值由倾向度量确定。我们的分类方法称为 OringPV,它使用蛋白质相互作用的倾向向量进行支持向量机。OringPV 在三个基准数据集上进行了各种分类任务的测试,例如晶体包装和生物相互作用之间的区别、两种不同类型的生物相互作用之间的区别等。评估框架包括数据集内、跨数据集比较和留一交叉验证。结果表明,低 ASA 残基对和蛋白质相互作用的倾向向量描述在区分方面确实很强大。特别是,许多跨数据集泛化能力测试实现了出色的召回率和整体准确性,大大优于现有的基准方法。