College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
Department of Pharmacy, The University of Lahore, Sargodha Campus, Pakistan.
Int J Biol Macromol. 2024 Oct;277(Pt 1):134147. doi: 10.1016/j.ijbiomac.2024.134147. Epub 2024 Jul 24.
Heat shock proteins (HSPs) from different families and sub-types play a vital role in the folding and unfolding of proteins, in maintaining cellular health, and in preventing serious disorders. Previous computational methods for HSP classification have yielded promising performance. However, most of the existing methods rely heavily on amino acid composition features and still face challenges related to interpretability and accuracy. To overcome these issues, we introduce a novel frequent sequential pattern (FSP)-based analysis and classification method for the classification of HSPs, their families, and sub-types. The proposed method is called FSP4HSP, which stands for "FSP for HSP". It identifies FSPs of amino acids (FSPAAs) and utilizes them for analysis and classification. Besides FSPAAs, sequential rules among amino acids are also discovered. Both binary and multi-class classification scenarios are considered, with the utilization of eight integer-based and four string-based classifiers. The incorporation of FSPAAs in the classification/prediction task enhances the interpretability of FSP4HSP and a comprehensive performance comparison using various evaluation measures demonstrates that it surpasses existing methods for the classification/recognition of HSPs.
热休克蛋白(HSPs)来自不同的家族和亚型,在蛋白质的折叠和展开、维持细胞健康以及预防严重疾病方面发挥着至关重要的作用。先前用于 HSP 分类的计算方法取得了有希望的性能。然而,大多数现有的方法严重依赖于氨基酸组成特征,仍然面临着可解释性和准确性方面的挑战。为了克服这些问题,我们引入了一种新颖的基于频繁序列模式(FSP)的 HSP 分类、家族和亚型分析和分类方法。该方法称为 FSP4HSP,代表“用于 HSP 的 FSP”。它识别氨基酸的频繁序列模式(FSPAAs)并利用它们进行分析和分类。除了 FSPAAs,还发现了氨基酸之间的序列规则。同时考虑了二进制和多类分类场景,并使用了八种基于整数和四种基于字符串的分类器。在分类/预测任务中纳入 FSPAAs 增强了 FSP4HSP 的可解释性,并且使用各种评估指标进行的综合性能比较表明,它在 HSP 的分类/识别方面优于现有方法。