IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1470-1479. doi: 10.1109/TCBB.2018.2793858. Epub 2018 Jan 15.
The hot regions in PPIs are some assembly regions which are composed of the tightly packed HotSpots. The discovery of hot regions helps to understand life activities and has very important value for biological applications. The identification of hot regions is the basis for protein design and cancer prevention. The existing algorithms of predicting hot regions often have some defects, such as low accuracy and unstability. This paper proposes a novel hot region prediction method based on diverse biological characteristics. First, feature evaluation is employed by using an impoved mRMR method. Then, SVM is adopted to create cassification model based on the features selected. In addition, a new clustering algorithm, namely LCSD (Local community structure detecting), is developed to detect and analyze the conformation of hot regions. In the clustering process, the link similarity of protein residues is introduced to handle the boundary nodes. This algorithm can effectively deal with the missing residue nodes and control the local community boundaries. The results indicate that the spatial structure of hot regions can be obtained more effectively, and that our method is more effective than previous methods for precise identification of hot regions.
PPIs 中的热点区域是一些组装区域,由紧密堆积的热点组成。热点区域的发现有助于理解生命活动,对生物应用具有非常重要的价值。热点区域的识别是蛋白质设计和癌症预防的基础。现有的热点区域预测算法往往存在一些缺陷,如准确性低和不稳定性。本文提出了一种基于多种生物特征的新的热点区域预测方法。首先,使用改进的 mRMR 方法进行特征评估。然后,基于选择的特征采用 SVM 来创建分类模型。此外,还开发了一种新的聚类算法,即 LCSD(局部社区结构检测),用于检测和分析热点区域的构象。在聚类过程中,引入了蛋白质残基的链接相似性来处理边界节点。该算法可以有效地处理缺失的残基节点,并控制局部社区边界。结果表明,可以更有效地获得热点区域的空间结构,并且与以前的方法相比,我们的方法可以更有效地精确识别热点区域。