Department of Computer Science, Jamoum University College, Umm Al-Qura University, Jamoum, Saudi Arabia.
Department of Biology, North Carolina Agricultural and Technical State University, Greensboro, NC, USA.
Methods Mol Biol. 2022;2499:155-176. doi: 10.1007/978-1-0716-2317-6_8.
Peroxiredoxins (Prxs) are a protein superfamily, present in all organisms, that play a critical role in protecting cellular macromolecules from oxidative damage but also regulate intracellular and intercellular signaling processes involving redox-regulated proteins and pathways. Bioinformatic approaches using computational tools that focus on active site-proximal sequence fragments (known as active site signatures) and iterative clustering and searching methods (referred to as TuLIP and MISST) have recently enabled the recognition of over 38,000 peroxiredoxins, as well as their classification into six functionally relevant groups. With these data providing so many examples of Prxs in each class, machine learning approaches offer an opportunity to extract additional information about features characteristic of these protein groups.In this study, we developed a novel computational method named "RF-Prx" based on a random forest (RF) approach integrated with K-space amino acid pairs (KSAAP) to identify peroxiredoxins and classify them into one of six subgroups. Our process performed in a superior manner compared to other machine learning classifiers. Thus the RF approach integrated with K-space amino acid pairs enabled the detection of class-specific conserved sequences outside the known functional centers and with potential importance. For example, drugs designed to target Prx proteins would likely suffer from cross-reactivity among distinct Prxs if targeted to conserved active sites, but this may be avoidable if remote, class-specific regions could be targeted instead.
过氧化物酶(Prxs)是一个蛋白质超家族,存在于所有生物中,它们在保护细胞大分子免受氧化损伤方面发挥着关键作用,但也调节涉及氧化还原调节蛋白和途径的细胞内和细胞间信号转导过程。使用专注于活性位点附近序列片段(称为活性位点特征)和迭代聚类及搜索方法(称为 TuLIP 和 MISST)的计算工具进行生物信息学方法,最近已经能够识别超过 38000 种过氧化物酶,并将其分类为六个具有功能相关性的组。由于这些数据在每个类别中提供了如此多的 Prx 示例,机器学习方法提供了一个机会,可以提取这些蛋白质组特征的其他信息。在这项研究中,我们开发了一种名为“RF-Prx”的新计算方法,该方法基于随机森林(RF)方法,结合 K-空间氨基酸对(KSAAP),用于识别过氧化物酶并将其分类为六个亚组之一。与其他机器学习分类器相比,我们的方法表现优越。因此,将 K-空间氨基酸对与随机森林方法集成,使我们能够检测到已知功能中心之外具有潜在重要性的类特异性保守序列。例如,如果针对保守的活性位点设计针对 Prx 蛋白的药物,那么针对不同的 Prx 可能会发生交叉反应,但如果可以针对远程的、特定于类的区域进行靶向,那么这种情况可能会避免。