Daqrouq Khaled, Alhmouz Rami, Balamesh Ahmed, Memic Adnan
Electrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
Center of Nanotechnology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
PLoS One. 2015 Apr 10;10(4):e0122873. doi: 10.1371/journal.pone.0122873. eCollection 2015.
PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification.
PDZ结构域已被确定为一系列信号蛋白的一部分,这些信号蛋白通常没有关联,除了它们所包含的结构保守的PDZ结构域。这些结构域与许多疾病过程有关,包括常见的禽流感,以及非常罕见的疾病,如弗雷泽综合征和尤塞综合征。从历史上看,根据它们形成的相互作用和键的性质,PDZ结构域最常被分为三类之一(I类、II类和其他类——III类),这直接取决于它们的结合伙伴。在本研究中,我们报告了三种基于结构域一级氨基酸序列中双字母组和三字母组出现及存在重排的独特特征提取方法,以辅助PDZ结构域分类。提出了小波包变换(WPT)和以小波熵(WE)表示的香农熵特征提取方法。使用115个独特的人类和小鼠PDZ结构域,存在重排方法产生了较高的识别率(78.34%),优于我们基于出现重排的方法。采用验证技术时识别率为(81.41%)。从一级序列进行PDZ结构域分类的方法被证明是获得一致分类结果的一种令人鼓舞的方法。我们预计,通过增加数据库规模,我们可以进一步改进特征提取和正确分类。